Available at https://storj.io/whitepaper
storj / whitepaper Goto Github PK
View Code? Open in Web Editor NEWThe Storj Whitepaper
Home Page: https://storj.io/whitepaper
License: Other
The Storj Whitepaper
Home Page: https://storj.io/whitepaper
License: Other
Available at https://storj.io/whitepaper
SJCX is the primary token of the Storj network. Its primary use is to allow the buying and selling of storage space on the decentralized network. We chose to use SJCX over Bitcoin and many other cryptocurrencies for just a few of the following reasons:
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L278
Bridge is designed to store only metadata, and to never transit or store shards.
Doesn't the bridge transit shards during the rebuild process?
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L334
This paragraph focuses on how the Bridge facilitates sharing an encryption key for public files.
Had a very productive side channel with @frdwrd on RocketChat:
you could envision a system where file pointers and keys are posted publicly anywhere, and clients pay for downloads on their own and then check hashes. No bridge needed there.
Would a second paragraph here describing how public buckets could work without a Bridge be useful?
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L192-L249
This section is a bit deep for an implementation detail of the reference implementation of the storj protocol. Would it make sense to break this out into a separate white paper? It could stand on its own.
Is this substantiated?
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L58
Because peers generally rely on separate hardware and infrastructure, data failure is not correlated
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L158
Current topic options include size and bandwidth commitment
It isn't clear how this is implemented. Perhaps a brief paragraph explaining how a farmer would subscribe to topics for their bandwidth/size commitment, and how clients would announce?
For a partial audit covering n bytes of an N byte file, with K modified bytes, and k >= 1.
It's a hypergeometric distribution.
The paper states that shards are independently encrypted, but in actuality the complete file is encrypted then sharded. This is to prevent an attacker from being able to read a segment of a file in the unlikely event a shard becomes compromised - they must retrieve all of shards to read any of them (by default).
Data channels
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L124
These messages are sent from data owners to farmers and contain the hash of the data and a challenge number.
challenge number
Isn't the challenge itself sent? If not, the paper isn't clear on how the farmer receives the challenge.
I disagree with the intro paragraphs of section 3.
Many of these functions require high uptime and significant infrastructure, especially for an active set of files. User run applications, like a file syncing application, cannot be expected to efficiently manage files on the network.
This is no different than running Dropbox or other cloud-syncing tool. Let's take Transmission (a popular bittorrent client) as an example. I run this on my linux NAS. What actually runs is transmission-daemon
, a background process that runs constantly, persisting across crashes/reboots/etc. Even if I am not downloading/seeding a torrent, this is running and my node is actively participating in DHT/PEX.
When I want to share/download a torrent file, I use transmission-client
to make RPC calls to the daemon and provide the file or magnet URL. If I want to stop a download, or delete a torrent out of my offering, I use the -client
.
I see storj (from client/user perspective, not farmer) no differently. Even on Mac/Windows, a process (actually, probably a couple threads) runs/lives in the background, joining with other nodes, doing PING/PONGs, participating in DHT, etc.
When I want to store a file, I interact with the GUI and the app slices up the file, encrypts it, stores audit/challenge information locally, sends out contract requests, and uploads to farmers. It would be an app preference as to how many mirror copies the user wants living on the network and would be the responsibility of the app to ensure this.
Since the app is intended to always be running (like DB/ACD/etc), it would also be the apps responsibility to do hourly/daily/periodic audits of all files.
As it's written, the paper seems to lean towards the bridge being a required piece to this puzzle. Yet, this creates a dependency and a single-point-of-failure and defeats the whole purpose behind a distributed/shared-nothing storage platform.
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L78
This line claims that the Node ID being a bitcoin address deterres both the Sybil and the Eclipse attacks. Section 5.2 references how this dramatically increases the difficulty of an Eclipse attack, but doesn't mention anything about how this helps with the Sybil attack.
There are references later in the paper about a trust system built on long-term identity, perhaps this is referencing that?
Sybil: https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L413
Eclipse: https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L422
several TODO sections
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L39
Currently makes claims about the shortcomings of cloud service providers, these should be linked to references backing the claim.
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L355
Many of the features Bridge servers provide, like permissioning and intelligent contracting, leverage considerable network effects. Larger data sets create far better performance for clients.
I don't understand what it means by 'leverage considerable network effects'. I'm translating this to 'leverage considerable knowledge of the network' when I read it.
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L92
The data owner stores the set of challenges, the Merkle root and the depth of the Merkle tree
This diagram makes it seem like the depth of the tree is always 5
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L352
This function is nearly impossible to distribute, as the retrieval and processing overhead of a distributed network are unsuitable to the high-performance demands of most storage applications
I've read this several times and don't fully understand what is being said. In the context of the section, I can gather that its trying to make the case that network information isn't a problem that can be solved in a P2P way, but I don't understand the reason it is giving.
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L398
This system relies on miners’ fees as a substitute for proof-of-burn.
Makes the assumption that the reader knows what proof-of-burn is. Perhaps a citation or a quick sentence giving context to the phrase?
Section 2.5 (Payments) "Micropayment networks, like the Lightning Network, Implementation details of other payment strategies are left as an exercise for interested parties."
The sentence doesn't make sense - what exactly was the intention? It is most likely that networks like the Lightning Network
are viable alternatives to Storjcoin, in which case just a few words are missing.
https://github.com/Storj/whitepaper/blob/master/Storj%20Whitepaper%20V2.tex#L357
Application developers then delegate trust to the Bridge, exactly as they would to a traditional object store. This shifts significant operational burdens from the application developer to the service-provider with minimal trust delegation.
This statement says that application developers delegate trust to the Bridge in the exact same way as traditional object stores, but delegates less trust compared to traditional object stores.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.