"ent" -- next-generation network design for Mnet

CVS revision: $Id: ent.html,v 1.123 2004/07/17 11:33:25 arnowa Exp $

about this document

This document is current as of Mnet "v0.7.0.32-UNSTABLE (has newnet merge)", circa 2003-10-06.

author: Zooko

I took a three-month break from developing ent, and when I returned and started trying to finish the implementation, I had a whole bunch of questions about the design. I realized that other hackers who are learning the ent design will have similar questions, and that this is a golden opportunity to write a doc -- the person with the questions is also the person with the answers! At least, I hope I'll have the answers after I read my own three-months-ago source code, run the unit tests, and think back to what I was doing when I wrote it.

This doc will hopefully take you from understanding the basic concepts of Mnet v0.6 (circa 2003) to understanding the design of ent, and it will hopefully leave you in an excellent position from which to learn the implementation of ent by reading the source code.

desiderata

I want a decentralized datastore, and a routing scheme, which are:

In addition, I would really like one which is:

In addition, I would really like to be able to add new services easily:

history and related work

Ent is a decentralized datastore based on an extended variant of a DHT, similar to proposals such as Chord FileSystem, Past, and OceanStore. Ent is most directly inspired by Kademlia. Ent will differ from these in several ways, but these differences are subtle, and descriptions of them will not fit into this "basic idea" paragraph. Of course, Ent also owes a debt to Mojo Nation.

the basic idea, architecture

topology

Imagine a really big binary tree. The left half of the tree contains ent nodes and data blocks which have a 0 bit in the first bit of their ID. The right half of the tree contains ent nodes and data blocks which have a 1 bit in the first bit of their ID. If you go down into the left half of the tree, because the first bit of your ID was 0, then you proceed into the left quarter if the second bit of your ID is 0, or you proceed into the second quarter if the second bit of your ID is 1. It's simple -- it's just a great big binary tree where the position of nodes and datablocks are determined by the bits of their IDs.

Now the basic idea in Kademlia (and earlier in Pastry) is that you can efficiently route from any point to any other point in this tree if the following condition holds: each node has one link to a node in the other half of the tree from it, and one link to a node in the same half of the tree but in the other quarter, and one link to a node in the same quarter but the other eighth, and so forth. From now on we'll call this condition the "minimally-well-connected condition".

routing

Routing is very simple: you route to the peer which is closest to the target. Which one is the "closest", exactly? The one who lives in the smallest subtree that includes the target. If this doesn't make sense to you yet, load the Kademlia paper (referenced above) and look at the diagram on page 3.

network maintenance

Network maintenance isn't quite so simple, but it isn't too complicated either -- you just the routing service to discover peers so that you always satisfy the minimally-well-connected condition -- you always know (at least) one peer in the other half of the tree, at least one in your half but the other quarter, and so forth.

But then there are some more complications such as the prevalence of firewalled or NAT'ed nodes, and the need for routing that is real-world-efficient, for example we want to avoid taking intercontinental hops when high-speed local routes would have worked.

We will have to tweak the design in order to handle these problems but some of the design space has been mapped out for us by Pastry and Kademlia. But it gets trickier -- read on!

data storage

Data storage is hard in the ad hoc Internet setting that we are aiming at. We require real world efficiency and robustness. The individual nodes are unreliable, unpredictable, and very heterogeneous in their capacities. It is unthinkable to do mass copying of stored data from one node to another as nodes come and go, so instead we'll have to add another layer on top of the basic routing so that we can find data which isn't in the exactly-right place anymore due to the network topology changing out from under it. We need redundancy in connectedness so that as nodes come and go we can use the network maintenance protocol to keep the minimally-well-connected condition, as well as the added data-finding layer.

other services

We might want to use the same binary tree to provide other services than data storage, such as messaging and metadata search/discovery. Those things might be easily plugged in as different uses of the identical routing scheme, or they might not. (Some of the other Mnet hackers are more confident than I am that they will be easy to add.)

advanced topics

incentives

Nodes require some motivation to supply other nodes with services, instead of acting solely as parasitic nodes that request services without providing them. When Mojo Nation was invented, this was an article of faith. Today, it is a fact of life. The nature of Pastry/Kademlia/Ent relationships provides a possible solution to this conundrum: simple bilateral trading. There are two reasons why Pastry/Kademlia/Ent topology enables this: first, all links are symmetric -- if I can use my link to you to satisfy my requirement to know a peer in the opposite half of the tree, then you can use your link to me to satisfy your requirement likewise. Second, each node is required to know only a few other nodes. Specifically, the number of nodes that you are required to keep links to is logarithmic in the size of the entire network.

However, actually enforcing such discrimination requires changing the rules of network maintenance and possibly even topology. It also required figuring how to deter parasites without locking out newcomers.

attack resistance

I won't write about this topic right now, because it is difficult to solve and we need to implement the more basic features first. Nonetheless, I won't be satisfied until we have implemented attack resistance, and I'm sure that attack resistance will interact strongly with the other parts of the architecture.

putting it all together

Unfortunately these concerns can't be solved separately -- a single design will have to deal with routing, network maintenance, and data storage. When we implement advanced topics, the same design will have to solve the advanced topics.

development process

The design of ent is going to evolve significantly as it is implemented and tested. The current goal is to have a very simple Kademlia-like network which has topology, routing, and network maintenance as described above, plus the most naive possible implementation of data storage, and nothing else.

the design

You have one local object of class ent.Node. That is your node in the ent network..

ent.Nodes offer three methods for your use: store(), fetch(), and introduce().

store() takes as argument a block of data. It stores the block on the ent decentralized data store. fetch() takes as argument a blockId, which is always the SHA1 hash of a block of data. After you've called fetch(), the ent.Node will do its best to acquire a copy of the identified block and give it to you via a callback. introduce() takes as argument a CommStrat -- an object which gives you the crypto keys and IP address necessary to communicate with a peer. The purpose of introduce() is for you to connect your node to the ent network. Your node has to have at least one connection to the ent network before it will be able to use the ent network to form other connections. In otherwords, introduce() is really just for bootstrapping -- once you have called it, the ent node should from then on maintain proper connections to the network by itself.

Nodes can interact remotely with other nodes across the network by sending messages. The messages are "store", "fetch", and "look for peer". Whether we are talking about a local user (you) invoking store() or a remote peer (another ent Node) sending "store" will hopefully always be clear from context. In the current implementation, the locally-invoked store() and the remotely-invoked "store" do the same thing and share code.

"store" and "fetch" do what you'd expect. "look for peer" is more subtle. A "look for peer" message comes with three data fields: "targetId", "querier commstrat", and "nonce".

"targetId" is the ID to route to -- the "look for peer" message will be routed to the node who is closest to that ID excluding the querier himself. This exclusion is important so that you can send a "look for peer" with your own ID in the targetId field and thus discover the nearest node to you (instead of discovering yourself).

"querier commstrat" is the comm strat of the node which initiated the query. This is used to perform the aforementioned exclusion and it is also used for the reply. That is: the reply to a "look for peer" does not travel backwards along the route that the query took. Instead the ultimate recipient -- the closest node to the targetId -- connects directly back to the initial querier.

The nonce


Zooko
Last modified: Mon Oct 6 14:10:02 EDT 2003