This document is current as of Mnet v0.7.
CVS revision: $Id: new_filesystem.html,v 1.123 2004/07/17 11:33:25 arnowa Exp $
The design recorded in this document has been implemented, except the feature of packing the entire file into the inode block isn't implemented.
There is one known buglet in this design -- if the file were packed into the inode block, and the decryption key were incorrect, then there is a tiny chance, which I currently estimate to be around 2^-64, that the user could see a file of random gibberish (that is, whatever you get by decrypting the ciphertext with that incorrect key) rather than an error message. I haven't analyzed it more closely. We've decided to add an HMAC tag at the end of the inode in order to guarantee the file's integrity. This change has not yet been written into this document nor implemented.
Choose a "user-supplied key seed". If you want the merge duplicate encrypted blocks feature (see "merge duplicate encrypted blocks", below), then choose your key seed to be the zero-length string. Generate the AES encryption key (see "encryption key generation", below).
Take a file and break it into a bunch of shares using an erasure code. Encrypt the shares, and a "shareId" is the SHA-1 hash of an encrypted share. Put all the shareIds together and call the result an "inode". Encrypt the inode, and the "inodeId" is the SHA-1 hash of the encrypted inode. An mnet URI is an inodeId concatenated with the symmetric encryption key, and mnet-base-32 encoded.
Note that everything is encrypted with a key, and the key is present in only one place -- the mnet URI. This offers cryptographic protection of the property that if you don't have the mnet URI, then all of the blocks (whether inode blocks or data blocks) are opaque to you. You can't even tell which blocks are inode blocks and which blocks are data blocks, except with statistical analysis of usage patterns.
It might be desirable that the same file stored by two different users should yield the same encrypted blocks. This increases the reliability and reduces the storage requirements. But it also reduces privacy, in that it allows someone who has a copy of the file to determine if certain blocks are part of that file. If the user desires this feature, they can choose the user-supplied key seed to be the string with zero length. If they do not desire this feature, they should choose a key seed that is unguessable, even to someone who has a copy of the file.
The symmetric cipher encryption key is equal to the first 128 bits of the SHA-1 hash of a key generation struct. A key generation struct is a string that looks like this:
((1:M2:M)(1:K2:K)(22:user-supplied key seedlen_of_seed:seed)(18:sha-1 hash of file20:hash_of_file)(7:purpose83:Part of http://mnet.sf.net/. Generate encryption keys from user-supplied key seeds.))
M, K are each two bytes long, seed is variable length, len_of_seed is the length of seed in bytes expressed in decimal notation, and hash_of_file is 20 bytes long.
(This is Rivest-style s-expression format. That is: every item is either a parenthesized list of items, or else it is a string of bytes preceded by its length and a colon character.)
The first eight bytes of an inode look like this:
[vernum (1 byte)] [file length (7 bytes)]
In this version of ZNFF, the "vernum" field always contains a 0 byte. Then, if the file is small enough to fit into this inode (i.e., if file length <= len(inode) - 8), then the file contents are present in the inode, and any remaining space is padded with zeros. In that case, the remainder of the inode looks like this:
[file contents (variable length == file length)] [zero padding (variable length == to the end of the inode)]
Else, the remainder of the inode look like this:
[K (2 bytes)] [M (2 bytes)] [blockId (20 bytes)] * M [zero padding (variable length == to the end of the inode)]
(That means there are exactly M instances of the "blockId" field.)
We currently use Rizzo's erasure code and (modified and optimized) C implementation.
(Search in text for "erasure codes".)
Here is a pseudocode recipe for encoding a file into ZNFF.
Requirements: B, the block size, must fall into a few size classes. This facilitates privacy, and it can also ease management of lots of blocks in a local store (which is something block servers have to do). The minimum size for a block is 2^16 bytes (arbitrarily chosen), and it must be a power of two (arbitrarily chosen). K must be >= 2 (else we're just doing replication, not erasure coding) and M must be >= K. K+M must be < 2^8 (this is a constraint of the erasure code). B*K must be >= F (the file length) (in order to hold the contents of the file). BI has the same constraints as B.Actually I don't think the erasure code requires K+M < 2^8. The fec.c source code simply checks that K <= 2^8 and M <= 2^8. I must have misunderstood the paper, or else the source code isn't bothering to check the constraint properly!
The result will be the smallest block size B that can hold the file, and the smallest share num K for that block size.
There is only one hard limit on file size: the "file length" field is 7 bytes == 56 bits, so the largest file can only be (2^56) - 1 bytes long.
Note that in order for a file to be retrievable, it must be the case that any K of the shares are retrievable and the whole, unique inode itself. How to accomplish this is out of the scope of this document, which says nothing about such things. It would be possible to perform erasure coding on the inode itself, but then you would have the same problem: you would have to have any K of the shares of the file, plus any K-prime of the shares of the inode, and the whole, unique object which contains the list of the shareIds of the shares of the inode. You could solve this problem by making that object be the mnet URI itself, i.e. by putting the list of the shareIds of the inode into the mnet URI, but this would violate desideratum 4. Indeed, you can see that the problem of the inode not being erasure-coded is a direct and unavoidable consequence of desideratum 4.
If you want to increase your file's survivability and the speed with which it downloads, and you are willing to relinquish desideratum 4, you can give someone the full inode (minus the zero padding) instead of giving them the mnet URI. You can mix-and-match mnetURIs and inodes freely and dynamically, for example, you can store a file (including storing the padded, encrypted inode block) and then later you can decide whether to give someone a copy of the inode or the mnet URI.
So far the only thing we've taken care of is the actual data. If you start with the mnet URI (or the inode and the encryption key) then you can get the file contents. But in actual use, there is always some metadata or other that you also want.
An especially important piece of metadata is type. Currently this format doesn't store the filename extension or MIME type of the file. Users who download stuff from Mnet would probably like their file to open in the appropriate application, or to save it to their filesystem under a reasonable file name. One option is to use something like the file(1) utility to tell what the file is on the receiving side. Two problems with that are that file(1) is fallible, and it does not offer a suggested extension with which to construct a filename to save the file.
One problem with putting metadata in the inode is that whenever the metadata differs it causes the mnet URI to be different.
General options for metadata:
<a type="text/plain" href="mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug">a short story I wrote</a>. (Yes, that
typeattrib is valid HTML 4.0 acording to indexdot. No idea if it's supported in any browsers.)
<mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug> <http://purl.org/dc/elements/1.1/format> "text/plain", <mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug> <http://purl.org/dc/elements/1.1/title> "my body is a simulation". Then store that file, and give out its mnet URI instead of the data file's mnet URI.
type=text/plain\ntitle=my body is a simulation. Then store that file, and give out a combination of the data file mnet URI plus the metadata file mnet URI, like this: <a href="mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug|z8nbpsio1nisuap1f6946ome437chdxosyznes6koetaz6yy>a short story I wrote</a>.
We've currently agreed to leave all file metadata (even type metadata and immutable metadata) out of the inode for now because (a) there are several good possibilities for how to put this metadata in a higher layer that we want to explore and (b) we can change our minds and add it into the inode later if none of those possibilities turns out well enough. --Zooko 2003-03-11
Thanks to Myers Carpenter, Hauke Johannknecht, Jim McCoy, Luke Nelson for help with this design. Extra thanks to Myers for proposing a fix to the problem of the encryption system falling apart in the presence of re-used keys.