Light

storj / whitepaper Goto Github PK

View Code? Open in Web Editor NEW

26.0 10.0 7.0 37.43 MB

The Storj Whitepaper

Home Page: https://storj.io/whitepaper

License: Other

TeX 99.71% Makefile 0.11% Python 0.18%

whitepaper paper research storj

whitepaper's Introduction

Storj Whitepaper V3

Available at https://storj.io/whitepaper

whitepaper's People

Contributors

Stargazers

Watchers

Forkers

rnabel nginnever siemantic wobytes othebault miseclab elek

whitepaper's Issues

Bundle of Suggestions

Page 1

Section should be renamed authors
Vitalik should be listed as an contributor
"as data availability is a function of popularity, rather than utility" like this line

Page 2

The use of "autonomous agents" seems a little strange in the current context. Perhaps reference a section when you talk about automatic negation or change the wording.

Page 3

The word "unrelated" is could probably be changed to something that better fits
Step 3 seems out of place. The user has not read about the audit process yet.

Page 4

Remove Section 2.2.2 Triggers. We currently don't use Triggers anywhere, and I don't think we have any plan to use them at the current time. Its really cool, but we should probably save the users attention.

Page 8

Love the live goats example.
You included 1/3 examples of SJCX usage and value in the network. The other two should be included.

SJCX Value

SJCX is the primary token of the Storj network. Its primary use is to allow the buying and selling of storage space on the decentralized network. We chose to use SJCX over Bitcoin and many other cryptocurrencies for just a few of the following reasons:

Bitcoin is not divisible enough to support the granularity of storage and bandwidth transactions needed.
We intend to use hub and spoke micropayments for storage and bandwidth. This requires large amounts of tokens available in the hubs to keep the transaction fees low but can be underwritten by the network and tokens, for the benefit of all.
Due to external factors, a coin’s price may rise or fall dramatically. This leads to farmers adding or removing drives from the network (because their incentives change) in a way that doesn’t reflect the needs to the network. Having a network specific token limits those external influences and factors.

Page 9

As of this Quasar is hop based, not TTL based.

Page 10

Remove Section 2.7.3 Layered Erasure Coding. Not necessary and may confuse the use.

Page 13

@boshevski Missing figure

Page 14

LaTeX issue. FIND_TUNNEL is way off the boundaries.

Page 15

We will be retiring WebStockets very soon. @bookchin Will probably help you fill this out.

Page 18

Don't believe the word "jurisdiction" is the best word choice considering the audience.
Don't believe the word "management" is the best word choice considering the audience.

Page 21

Section 3.5 paragraph two needs to be reworked, sounds a bit weird

Page 22

Section 4.2 paragraph 4, should read "No application can be completely secure, but auditable code is the best defense of users privacy and security."

Page 27

Section 5.4 Hostage Bytes. The original Hostage Bytes included: " As long as the client keeps the bounds of its erasure encoding a secret, the malicious farmer cannot know what the last byte is." This is missing from the new Hostage Bytes section. Both redundancy and keeping the redundancy/erasure encoding bounds secret are a defense against this attack.

Page 31

Eliminate A.5 16. Triggers

Bridge doesn't transit shards

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L278

Bridge is designed to store only metadata, and to never transit or store shards.

Doesn't the bridge transit shards during the rebuild process?

Public files without a bridge

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L334

This paragraph focuses on how the Bridge facilitates sharing an encryption key for public files.

Had a very productive side channel with @frdwrd on RocketChat:

you could envision a system where file pointers and keys are posted publicly anywhere, and clients pay for downloads on their own and then check hashes. No bridge needed there.

Would a second paragraph here describing how public buckets could work without a Bridge be useful?

3.3.4 and 4.2 overlap

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L337-L343

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L369-L377

KFS deep dive

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L192-L249

This section is a bit deep for an implementation detail of the reference implementation of the storj protocol. Would it make sense to break this out into a separate white paper? It could stand on its own.

Are we sure node failure isn't correlated?

Is this substantiated?

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L58

Because peers generally rely on separate hardware and infrastructure, data failure is not correlated

Quasar Topics

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L158

Current topic options include size and bandwidth commitment

It isn't clear how this is implemented. Perhaps a brief paragraph explaining how a farmer would subscribe to topics for their bandwidth/size commitment, and how clients would announce?

Partial Audit Confidence Level

For a partial audit covering n bytes of an N byte file, with K modified bytes, and k >= 1.

It's a hypergeometric distribution.

TTL replaced by hops

Shard encryption

The paper states that shards are independently encrypted, but in actuality the complete file is encrypted then sharded. This is to prevent an attacker from being able to read a segment of a file in the unlikely event a shard becomes compromised - they must retrieve all of shards to read any of them (by default).

Gordon's Changes

2.4 - update OFFER flow. Initial publish is partially, all OFFER is fully constructed
storj implements kad-quasar -> storj implements quasar
better way to express kat thing
NAT and diglet -> NAT traversal and Reverse HTTP Tunneling
Tunnels are currently operated over websockt -> tcp socket
remove all references to data channels

Data channels

new data transfer: just HTTP.
POST to /shards/{shardhash}?token={token}
GET /shards/{hash}?token={token}

Client sends challenge number?

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L124

These messages are sent from data owners to farmers and contain the hash of the data and a challenge number.

challenge number

Isn't the challenge itself sent? If not, the paper isn't clear on how the farmer receives the challenge.

[WIP] v2.1 plans

Add graphic for Offer Loop
Add graphic for Quasar Loop

Section 3 Disagreement

I disagree with the intro paragraphs of section 3.

Many of these functions require high uptime and significant infrastructure, especially for an active set of files. User run applications, like a file syncing application, cannot be expected to efficiently manage files on the network.

This is no different than running Dropbox or other cloud-syncing tool. Let's take Transmission (a popular bittorrent client) as an example. I run this on my linux NAS. What actually runs is transmission-daemon, a background process that runs constantly, persisting across crashes/reboots/etc. Even if I am not downloading/seeding a torrent, this is running and my node is actively participating in DHT/PEX.

When I want to share/download a torrent file, I use transmission-client to make RPC calls to the daemon and provide the file or magnet URL. If I want to stop a download, or delete a torrent out of my offering, I use the -client.

I see storj (from client/user perspective, not farmer) no differently. Even on Mac/Windows, a process (actually, probably a couple threads) runs/lives in the background, joining with other nodes, doing PING/PONGs, participating in DHT, etc.

When I want to store a file, I interact with the GUI and the app slices up the file, encrypts it, stores audit/challenge information locally, sends out contract requests, and uploads to farmers. It would be an app preference as to how many mirror copies the user wants living on the network and would be the responsibility of the app to ensure this.

Since the app is intended to always be running (like DB/ACD/etc), it would also be the apps responsibility to do hourly/daily/periodic audits of all files.

As it's written, the paper seems to lean towards the bridge being a required piece to this puzzle. Yet, this creates a dependency and a single-point-of-failure and defeats the whole purpose behind a distributed/shared-nothing storage platform.

Are Sybil attacks really deterred through proof of work?

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L78

This line claims that the Node ID being a bitcoin address deterres both the Sybil and the Eclipse attacks. Section 5.2 references how this dramatically increases the difficulty of an Eclipse attack, but doesn't mention anything about how this helps with the Sybil attack.

There are references later in the paper about a trust system built on long-term identity, perhaps this is referencing that?

Sybil: https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L413
Eclipse: https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L422

Graphics, equations, tables

several TODO sections

Introduction should have references

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L39

Currently makes claims about the shortcomings of cloud service providers, these should be linked to references backing the claim.

Sentence is confusing to me

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L355

Many of the features Bridge servers provide, like permissioning and intelligent contracting, leverage considerable network effects. Larger data sets create far better performance for clients.

I don't understand what it means by 'leverage considerable network effects'. I'm translating this to 'leverage considerable knowledge of the network' when I read it.

Depth of the Merkel tree

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L92

The data owner stores the set of challenges, the Merkle root and the depth of the Merkle tree

This diagram makes it seem like the depth of the tree is always 5

Not fully clear what a paragraph is saying

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L352

This function is nearly impossible to distribute, as the retrieval and processing overhead of a distributed network are unsuitable to the high-performance demands of most storage applications

I've read this several times and don't fully understand what is being said. In the context of the section, I can gather that its trying to make the case that network information isn't a problem that can be solved in a P2P way, but I don't understand the reason it is giving.

proof-of-burn introduction or citation

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L398

This system relies on miners’ fees as a substitute for proof-of-burn.

Makes the assumption that the reader knows what proof-of-burn is. Perhaps a citation or a quick sentence giving context to the phrase?

Unclear sentence in Payments

Section 2.5 (Payments) "Micropayment networks, like the Lightning Network, Implementation details of other payment strategies are left as an exercise for interested parties."

The sentence doesn't make sense - what exactly was the intention? It is most likely that networks like the Lightning Network are viable alternatives to Storjcoin, in which case just a few words are missing.

Delegating trust contradiction

https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L357

Application developers then delegate trust to the Bridge, exactly as they would to a traditional object store. This shifts significant operational burdens from the application developer to the service-provider with minimal trust delegation.

This statement says that application developers delegate trust to the Bridge in the exact same way as traditional object stores, but delegates less trust compared to traditional object stores.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble