Partial storage nodes are nodes that only store some of the blocks in the blockchain, and can be queried by any other nodes (including light clients and other partial nodes) to download data from parts of the chain.
There's two main questions:
- Granularity: How should the chain be sharded (e.g. per 1 block, 10 blocks, etc)?
- Peer discovery: How peers associated with the shards be discovered?
I propose a method of answering the above, with a scalable "tree-based" approach.
Granularity
Let's assume a network-wide constant MIN_GRANULARITY = 1000 blocks
where MIN_GRANULARITY
is the minimum number of consecutive blocks you can advertise that you are storing to the network (which we call a "blockset"), and constant BASE = 10
. We call a range of blocksets a "blockrange" (e.g. blockrange 0-10K consists of blocksets 0-1K, 0-2K, ..., 9K-10K). We can organise the blocksets into a directory structure, where each directory has BASE
number of subdirectories (blockranges) or files (blocksets). Let's say there's 10 million blocks in the chain, the directory would look as follows:
0-10M/
├─ 0-1M/
│ ├─ 0-100K/
│ │ ├─ 0-10K/
│ │ │ ├─ 0-1K
│ │ │ ├─ 1K-2K
│ │ │ ├─ ...
│ │ │ ├─ 9K-10K
│ ├─ 100K-200K/
│ ├─ .../
│ ├─ 900K-1M/
├─ 1M-2M/
├─ .../
├─ 9M-10M/
Peer discovery
Subnet-based
Each subdirectories (blockranges) or files (blocksets) would be its own network topic. For example, a topic could be 0-10K
(blockrange) or 0-1K
(blockset). The network has the following interfaces:
GetPeers(topic)
returns some IP addresses for peers that have advertised that they are serving the blockrange/set for topic
.
Advertise(topic, ip)
advertises that a node with IP address ip
is serving the blockrange/set for topic
.
The above operations might be expensive or time-consuming. Therefore, depending on how many blocks and blockranges there are in the network, partial storage nodes may only advertise up to a certain height of blockranges, and likewise clients querying the nodes might only try to get peers from a certain height of blockranges. Let's assume a client-side variable GRANULARITY
, where GRANULARITY >= MIN_GRANULARITY
, on both partial storage nodes and client nodes.
When a partial storage node wants to call Advertise()
on blockranges that it's serving, it will only do so on blockranges that have a greater granularity than GRANULARITY
. For example, if a partial storage node is serving blocks 0-1M, and GRANULARITY = 100,000
, the it will call Advertise()
on 0-1M, 0-100K, ..., 900K-1M, but not 0-10K, ..., 9K-10K, etc.
Similarly, if a client wants to download data in block 1500 for example, the deepest blockrange it would try to GetPeers()
for is 0-100K. One can also construct different algorithms to find peers, using a top-to-bottom approach. For example, the client can first call GetPeers()
on blocks 0-10M, but if no node is storing 10M blocks, it could then try calling GetPeers()
on blocks 0-1M, and so on.
This would allow the network to self-adjust the acceptable data in each shard, depending on how big blocks are or how much storage resources partial nodes have.
Note: GRANULARITY
is a client-side variable that can be adjusted automatically by the client itself based on its success on downloading blocks at different granularities. On the other hand, MIN_GRANULARITY
and BASE
are network-wide variables that have to be agreed network-wide as part of the p2p protocol.
Status message-based
An alternative to a subnet-based peer discovery approach is an approach where there's only one network of partial storage nodes, that have status messages that represent which blocks they have. Partial storage nodes would have the following interface:
GetStatus(GRANULARITY)
where GRANULARITY >= MIN_GRANULARITY
returns a bit field where the index of each bit in the field is a blockrange corresponding to GRANULARITY
, and on-bit means that the node has the blocks in that blockrange.
For example, if a GetStatus(1M)
is called in a chain with 10M blocks, and the partial storage node is only storing blocks 1M-2M, the bit field would be as follows:
0100000000
^
|
blockrange 1M-2M