metux IT service

  • Increase font size
  • Default font size
  • Decrease font size

Nebulon Supercloud + Cluster Storage and Network neutrality

E-mail Print PDF

The Nebulon Supergrid project develops an technology for an heavily distributed, worldwide supergrid for storing large amout of data virtually anywhere, redundant and relatively near to the user.

Basics: key addressed storage

The addressing concept is very different from traditional storages like block devices or fileservers. It's derived from Plan9's venti storage system, which addresses data by its hashes:

Data is split into smaller blocks (eg. up to 56kB) and then uploaded to the storage. The client gets back the hash value of the block's data, called "score". Larger datasets/files are typically first stored blockwise, and the resulting list of scores is stored the same way, again and again, until there's only one score left. So the client only has to know the root score to retrieve the whole file.

This approach makes it particular easy to synchronize and split off the storage over the net, because each possible content is uniquely identified by its score - there are no updates to existing blocks - a block is either in the net or missing completely.

Nebulon is not the Internet

One thing has to be made clear: Nebulon is not any kind of sucessor of the internet. It has an completely different concept, it's not meant for global end-2-end communication, instead it's an globally accessible storage that just happens to work over the Internet.

But Nebulon is NOT bound to the internet - it can run over virtually anything that can route packages from one node to another, even direct dialup-links, leased lines or packat radio.

Each Nebulon node has a bunch of peer nodes it talks with, no matter which kind of channel is actually used for communication. In a similar way as the Internet sits ontop of virtually any kind of physical network and builds up a global scale virtual network, Nebulon sits ontop virtually any network and builds up an global scale superstorage.

Spying, Censorship and Net Neutality

The internet community currently has to face heavy attacks on the Internet's basic principles: goverments and major ISP spy in people's communication, build up censorship infrastructures and undermine the Net Neutrality.

Nebulon isn't completely immune to this attacks (FreeNet might be better at this), but at least makes it much harder.

  • clients can easily encrypt all data blocks, even for public data, so it's almost impossible to selectively block inconvenient data, like criticism and free speech.
  • the peer2peer (web-of-trust) nature makes it very hard to block inconventient users/nodes. each node needs only at least one peer node to get into the net - there are no topological limits like in the IP's addressing scheme.
  • it's extremly hard to track what an individual user/node up-/downloads from/to the cloud and even harder to hold them responsible because even if peer nodes have been raided, no one can tell who's really resposible for certain transfers.
  • there are no single servers which could be shut down by corrupt goverments, judges or other terrorists.
  • content discrimination (eg. slowing down or completely blocking content providers who refuse to pay Danegeld) is extremly difficult since the content/origination of each data block had to be known to make this work.

Cloud organisation and data flow

 

The Nebulon cloud consists of a virtually unlimited number of equally structured nodes, each one connected to a bunch of others, no things special (eg. super-)nodes. There is no special topology, and one node's connections are not visible to others.

Basicly we have two major kind of information traveling through the cloud: data blocks and messages.

Data blocks

 

As "Data block" we refer to an piece of data, which is identified by it's hash value, the "score". Datasets/files larger than a single data block are splitted into multiple blocks, the list of their scores is put into an index block. If multiple index blocks are required, this process is repeated until there's only one index block left, so there's only one score left to identify the whole file. Data duplication won't happen, since equal blocks will have the same score.

When data is uploaded into the cloud, it's first uploaded to the uploader's own local node - this node then may trigger it's peers to fetch the new blocks and so on. A client requesting some score asks it local node to deliver the data. If it doesn't have it yet, it will ask it's peers, they will ask their peers and so on. The nodes on the way (including the local one) then will cache the results for a ways, so the more some score is requested the more it gets duplicated in the cloud and the nearer it moves to the requesting clients.

Old/uninteresting data will simply die out some day. To prevent this, it just has to be re-requested from time to time.

Endpoint-Messages

 

In Nebulon the messages between endpoints are quite small (shouldnt be larger than typical IP packets) and only carry a small piece of data from one endpoint to another (if larger more data has to be sent, the message just contains the data's root score).

Endpoints in that context are a little bit like TCP sockets, but not clearly assigned to some node. Only the individual node knows whether a certain endpoint is hosted there. Each endpoint has an assigned to an public/private keypair which messages are encrypted with. To open up (read from) an endpoint, you need it's keypair, for sending to some endpoint, just it's public key. The pubkey is also the endpoint's global name.

For public message distribution channels, the private key is published too, so everyone can read from them. This way, eg. news sites or blogs can be easily built - subscribing to one is done by just letting your client listen to the right public channel.

Messages aren't routed directly to some endpoint (because nobody knows where the endpoint actually is) but just spread around in the whole cloud. This shouldn't be particularily problematic since these messages are very small and only tell if something important really happend (eg. an mailstorage or newsfeed updated), but not the data itself. Application developers should be really careful to send out only very few messages.

Use cases

Media hosting

Media content is normally created once and left as it is.
The media files are simply stored in VtStore format (including metadata).

Access from the outside (eg. via HTTP) runs via a linear cluster of VtStore-2-HTTP proxies.
New Proxy nodes can be trivially added or removed any time. No large initial data transfers necessary - mirroring happens on-demand.

Storage Area Networks and RAID replacement

Using an simple block device emulation, virtual disks/partitions can be easily provided through the Nebulon Cloud. The client systems may run its own Nebulon node for performance and redundancy, but does not need to. The need for backups is dramatically reduced: for individual servers, only a minimal bootup system an a few kbytes of status data has to be backed-up - all payload is stored redundantly within the Nebulon Cloud.

By a running an Nebulon-based cluster filesystem (eg. modified fossil), the Nebulon Cloud can provide an failsave, redundant and caching cluster filesystem. Bringing up new nodes does not require large data transfers. All necessary data is automatically fetched on-demand.