ipfs-cluster / ipfs-cluster Goto Github PK
View Code? Open in Web Editor NEWPinset orchestration for IPFS
Home Page: https://ipfscluster.io
License: Other
Pinset orchestration for IPFS
Home Page: https://ipfscluster.io
License: Other
Participants:
We itemized all the pieces we need to build, and figured out next steps.
These are all the components that we need to define or build. We need to:
Links:
We need to create use case scenarios
We need to do the following
2016-07-06 17:00Z
It needs to look like it's the IPFS API regarding response format too.
Here are a few questions I have had at various points while working out how ipfs-cluster works.
What exactly are peers communicating to each other?
What is the division of labor between an ordinary peer and the cluster leader? What extra work does the cluster leader do?
What is the purpose of bootstrapping in ipfs-cluster-service? Is this the way for a single node to begin its own cluster?
What does the comment "// The only way I could make this work" in ipfs-cluster/main.go's init function referring to?
What is the purpose of the --leave flag and why is it possible for a node to be considered part of the cluster when its ipfs-cluster-service process is no longer running? Why is --leave not default? It sounds like if a node does not leave on shutdown it can lead to problems, so what is the beneficial use case of leaving it in?
From context clues it seems like ipfs-cluster in the future hopes to provide options for consensus, ipfs connections and monitors? Is this the case? If so what is the purpose of having these options?
ipfshttp appears to communicate with an ipfs daemon listening a local port through an http api. Does the ipfs daemon have other methods of communication and its just a matter of cluster not implementing clients? I suspect this is the case because the ipfsconn folder seems to abstract away ipfs connections from an http api, however the service binary directly calls ipfshttp, so maybe the ipfsconn/ipfshttp hierarchy was intended to mean something else.
Thanks, and more to follow!
The os/etcd/rafthttp
Raft HTTP implementation (https://github.com/coreos/etcd/tree/master/rafthttp or https://godoc.org/github.com/coreos/etcd/rafthttp) can use TLS to secure communication, which in turn is capable of mandating the client be authenticated, allowing you to specify a CA to validate client certificates (https://godoc.org/crypto/tls#Config). We could use this to our advantage to add authentication to ipfs-cluster by having the cluster accept a configured certificate and a certificate authority that it uses to keep the cluster tight. Kubernetes or similar would be responsible for the CA and Certificate generation.
Specially since those calls are concatenated with a status
call.
It would be great to have option for auto pinning objects that are added/pinned with ipfs daemon.
Steps:
A Pin should not need to be pinned in every cluster member. We should be able to say that a pin needs to be pinned in 2, 3 cluster members.
We will start with a general replication factor for all pins, then maybe transition to replication factor per-pin.
These are thoughts for the first approach.
Replication factor -1
means Pin everywhere. If replication factor is larger than the number of clusters then it is assumed to be as large.
We need a PeerMonitor
component which is able to decide, when a pin request arrives, which peer comes next. The decision should be based on pluggable modules: for a start, we will start with one which attempts to evenly distribute the pins, although it should easily support other metrics like disk space etc.
Every commit log entry asking to Pin something must be tagged with the peers which are in charge. The Pin Tracker will receive the task and if it is tagged itself on a pin it will pin. Alternatively it will store the pin and mark it as remote
.
If the PinTracker
receives a Pin which is already known, it should unpin if it is no longer tagged among the hosts that are in charge of pinning. Somewhere in the pipeline we probably should detect re-pinnings and not change pinning peers stupidly.
Unpinning works as usual removing the pin only where it is pinned.
The peer monitor should detect hosts which are down (or hosts whose ipfs
daemon is down).Upon a certain time threshold ( say 5 mins, configurable). It should grep the status for pins assigned to that host and re-pin them to new hosts.
The peer monitor should also receive updates from the peer manager and make sure that there are no pins assigned to hosts that are no longer in the cluster.
For the moment there is no re-rebalancing when a node comes back online.
This assumes there is a single peer monitor for the whole cluster. While monitoring the local ipfs daemon could be done by each peer (and triggering rebalances for that), if all nodes watch eachothers this will cause havoc when triggering rebalances. The Raft cluster leader should probably be in charge then. But this conflicts with being completely abstracted from the consensus algorithm below. If we had a non-leader-based consensus we could assume a distributed lottery to select someone. It makes no sense to re-implement code to choose a peer from the cluster when Raft has it all. Also, running the rebalance process in the Raft leader saves redirection for every new pin request.
We need to attack ipfs-cluster-ctl
to provide more human readable outputs as the API formats are more stable. status
should probably show succinctly which pins are underreplicated or peers in error, 1 line per pin.
This is a quick rundown of a first-ever-user trying the ipfscluster
tooling. Take it with a pile of salt: i noted down everything i ran into, including a bunch of problems that are obviously just polish not meant to be there yet, and other stuff which is probably meant to be experimental or a shortcut for now. I noted everything i ran into to give feedback on what was intuitive and what wasn't, first reactions, etc. My proper review is just beginning-- this is just a first stab at playing with it.
Already, I think some work can be done on the connectivity side of things. Here's the basic points. maybe these can be extracted to separate issues later, but keeping these here to retain the context.
ipfs members ls
or something like it would show connectivity status, particularly both:
ipfs swarm
command. in reality, that should be a libp2p thing, so maybe we can lift that command out of go-ipfs and into go-libp2p, so that ipfscluster
may take advantage of it). #15ipfs swarm connect <multiaddr>
. #16Also, some docs on what log levels or log modules i should listen to to figure things out would be good. Eg if i want to debug connectivity, or the consensus stuff, or the interactions with the ipfs connector, what level should i --loglevel
. may be good to support per-module support (i think go-ipfs has this with an ENV var or something, i dont recall. it's useful to isolate a module and hear its debug output only). #18 #19
ipfscluster
go get -u
-- why should i mess up my system for you? #20
$(pwd)/.ipfs-cluster
instead of $(whoami)/.ipfs-cluster
, because in server side installations (more typical for clusters), users are not the typical place stuff is installed/stored. This is departing from the convention of go-ipfs and IPFS_PATH, but i think that's fine. #22> ipfscluster-server init
error loading configuration: open /Users/earth/.ipfs-cluster/server.json: no such file or directory
ipfscluster-server init --config .ipfs-cluster/server.json
did not work, took me a bit to realize the global flag had to be before the subcommand name.ipfscluster-server --config .ipfs-cluster/server.json init
worked as before (back to failing same way as above).mkdir .ipfs-cluster
is not enough. still same error.touch .ipfs-cluster/server.json
gets further, but crashes the process:
panic: multihash too short. must be > 3 bytes #22
goroutine 1 [running]:
panic(0x682b60, 0xc420074b20)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/ipfs/ipfs-cluster.NewMapPinTracker(0xc4201ef130, 0xc42016baa0)
/Users/earth/go/src/github.com/ipfs/ipfs-cluster/map_pin_tracker.go:45 +0x370
main.main()
/Users/earth/go/src/github.com/ipfs/ipfs-cluster/ipfscluster-server/main.go:134 +0x221
echo '{}' >.ipfs-cluster/server.json
did not work. same error.init
is an option, not a command! (ipfscluster-server -init
, not ipfscluster-server init
). that took a while.
git init
, ipfs init
, etc.), and to "commands" being in subcommand notation, not options. (i know golang flags isn't good about this.)-f
to overwrite. having to rm
manually is annoying. and it's less automation friendly. #21ipfscluster
ipfscluster
has a short command listing, nice.
COMMANDS
section in -h.ipfscluster member
lacks ipfscluster member add
. #23ipfscluster
tool to do everything. (launch the ipfscluster-server
, launch ipfs daemon
, a local one or a global one.).js { // ... "ipfs_api_addr": "127.0.0.1", "ipfs_api_port": 9095, "ipfs_addr": "127.0.0.1", "ipfs_port": 5001 // this is an ipfs_api, so hard to distinguish from ipfs_api_port. // maybe it should be "ipfs_node_port". #22 // ... }
"ipfs_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
"ipfs_node_api": "/ip4/127.0.0.1/tcp/5001/http",
"node"
. (becauser the cluster one is a node, and the cluster one is really the cluster api)."underlying_ipfs_node_api"
is more clear, but long. and "underlying"
is not that good of a word. maybe we should clarify the relationship between the cluster (and the ipfs-node it represents) and the sub ipfs-node. maybe "parent/child" works for this, because it works with the tree recursive structure?
"cluster_node_api": "/ip4/127.0.0.1/tcp/9095/http",
"child_node_api": "/ip4/127.0.0.1/tcp/5001/http",
// or
"parent_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
"child_node_api": "/ip4/127.0.0.1/tcp/5001/http",
cluster_peers
. but now i'm not sure how.
/ip4/127.0.0.1/tcp/9095/http
? or /ip4/127.0.0.1/tcp/9095
? or what?
/ip4/127.0.0.1/tcp/9095/ipfs/Qmfoobarbazpk...
)."/ip4/%s/tcp/%s/ipfs/%s", config.api_addr, config.api_port, config.id
.ipfscluster id
similar to ipfs id
. #21/p2p
protocol prefix, not /ipfs
, would be less confusing here.config.cluster_addr, config.cluster_port
that we want for this.ipfscluster
. let's do it manually for now. #21ipfscluster member ls
sweet.
// ipfscluster
Error 500: leader unknown or not existing yet
---
// ipfscluster-server logs
15:14:24.627 INFO cluster: pinning:QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt cluster.go:275
15:14:24.627 ERROR libp2p-rpc: leader unknown or not existing yet client.go:125
15:14:24.627 ERROR cluster: sending error response: 500: leader unknown or not existing yet rest_api.go:396
ipfscluster members ls
shows (2, 2, 3), instead of (3, 3, 3).
ipfscluster members ls
shows (3, 3, 3)> ipfscluster pin add <cid>
Request accepted
---
// ipfscluster-server logs
15:21:08.581 ERROR libp2p-raf: QmbGvizLZHVWto8ZWU2tbkNcV6W92G6AggKdPfx5gFbLZz: Pipeline error: EOF transport.go:716
// is this bad? o/
15:24:19.916 INFO cluster: pinning:QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt cluster.go:275
15:24:19.963 INFO cluster: pin commited to global state: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt consensus.go:267
15:24:20.348 INFO cluster: IPFS object is already pinned: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt ipfs_http_connector.go:205
ipfscluster pin ls
. #2 and #3 have it, #1 does not. probably that pipeline error, got disconnected... but ipfscluster members ls
still shows 3 for everyone, but i think #1 disconnected.#1> ipfscluster members ls
shows just 1. ok need to reconnect #2 and #3.2016/12/30 18:30:23 [INFO] snapshot: Creating new snapshot at /home/jbenet/.ipfs-cluster/data/snapshots/421-8-1483140623423.tmp
took a while.ipfscluster members ls
shows (3, 2). probably from #3 before it paniced. ok restart everything.
ipfs pin ls <cid>
... stuck in 2 machines. looks like it's iterating over the entire damn pinset, hanging the machine...ipfs refs local | grep <cid>
shows it in #3 (where it was added), but not on #2 nor on #1. looks like the cluster server knows about the pin, but it did not translate to the child ipfs node i had running in those machines... so it did not pin it.ipfscluster-server
logs should maybe show whether it found + can connect to the child ipfs node.
> ipfscluster status
cid: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt
status:
QmTHEzZHGTSiVFFM2h3TgFCSsp2Ecq82U6heAxK7jJRijF: (#2)
ipfs: pinning
QmUmQ2DRe2keGN8meXXLWjUgGbyiBLPWJFXGi4c2kfDGJb: (#1)
ipfs: pin_error
QmbGvizLZHVWto8ZWU2tbkNcV6W92G6AggKdPfx5gFbLZz: (#3)
ipfs: pinned
ipfs ping
).ipfscluster status
now shows pinning -> pin_error
on #2.pin_error
forever, or whether the cluster will try to get the node to repin.18:46:41.588 WARNI cluster: IPFS unsuccessful: 500: Path 'QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44ed' not pinned
ipfscluster status
paniced. then #2 paniced: https://gist.github.com/jbenet/e04c59731ce33a3522603efa7a22f3d3ipfscluster members ls
shows (3, 3, 3).ipfs pin ls <cid>
shows the pin on all 3! \o/ Yay.ipfs pin rm <cid>
in the child manually, hoping the cluster will notice the pin fail).ipfs refs local | grep <cid>
shows the pin. yay!ipfscluster pin ls
ipfscluster pin ls
shows the second pin, but no longer the first... the 2nd contains the 1st, but these should not be coalesced.
pin A, pin B, unpin B
, and I expect pin A
to remain, whether or not B
contains A
is irrelevant.C
and now pin ls
shows both B
and C
.Me when the pins succeeded:
[0] Notes on go packaging. (TL;DR: use gx-go. this is the expanded why use gx-go). Warning: this is a contrarian view in respect to the Go language. And this is a standard, sane view from package management, version control, and secure open source. Go packaging is designed for monolithic codebases (well tended sequoia) not open source (haphazard expansive brush forest). Go was designed at Google, baking into the language many of the software engineering practices of Google. In general this is a great thing. In the cases where open source != how google develops, it is not. Google has a single, huge tree of code, with atomic safe updates. You cannot merge something into the tree if ANYTHING across all (most) of google fails to compile/errors. Open Source is fundamentally different. There is no such atomic-safe-update gating. We cannot assume other people's systems are setup like ours, or that they want to update their tree to the version we require (running go get -u
for the user may be harmful to them). Or that we know who is depending on our code (lots of private code may depend on our package). Or that whoever is running a package we depend on wont screw everything up by moving something or breaking an API. Go uses location addressing for package identification... not just inside a single dilligent org (which works really well) but in the broader internet (which can fail catastrophically). Despite years of heated arguments on this, the Go team has not yet understood this is a real problem (they washed their hands off of it by having an external committe handle it). But that's ok because we are the people who use hash-linking to securely address everything. Let's use it to our advantage!
When there is cluster with let's say two peers and one of them or both gets shutdown for some reason, do peers stay remembered and connect automatically on startup?
I suppose they are automatically saved into service.json file in cluster peer section?
Thanks !
This wraps documentation in general:
Currently ipfs-cluster-service only handles SIGINT (as in ctrl-c). Killing the process with kill
produces a dirty shutdown.
go get -u gx
use $(shell which gx) or download to local installAll IPFS nodes associated to cluster nodes should have connectivity among themselves. Should trigger swarm connect
commands to all other known nodes.
I'm supposed to have 1.5.2 and it gives me
./ipfs-update --version
ipfs-update version 1.5.1
:(
I think this happens because the monitor only throws alerts for the current set of peers so it will not complain for a peer that has been removed from the cluster. Should not be like this. Remember to add a test.
Move components to subpackages. Make sure they log to different facilities.
bleve is like elastic search, but 100% go.
Its mature and many projects use it.
For IPFS it would provide very powerful search.
Its also very easy to integrate.
dgraph got it in in just a week for example
Please consider this at least and discuss.
Currently Cluster can allocate content to a number of peers but will not detect failures and re-allocate in that case.
Allocation is based on metrics which are regularly pushed to the Leader. If the last metric from a peer is expired or invalid then the peer is not considered as an available allocation when pinning. When re-pinning content, this situation is also detected and a new allocation will be found, so that part is done.
The idea is then give the PeerMonitor the task of producing Alerts on a channel on which the main component is listening. When the PeerMonitor detects that a component is down (because, i.e. last metric has expired), it sends an alert. Cluster will then find which Cids are allocated to the problematic peer and re-trigger Pin operations for each.
This implies PeerMonitors should be made aware of current clusterPeers (or be aware themselves with the RPC API to the pinManager).
It should be able to add peers and remove peers from the Cluster while the cluster is running.
Bonus points for auto-discovery of cluster members from a given seed.
Need an easy way to list all libp2p nodes involved in cluster (members and IPFS) and see what's connected to what (ideally everything is connected to everything).
Travis tests fail sometimes in random places (usually tests around replication). It has usually proven useful to increase delays, but should really look closer into it.
One way I'd help with IPFS' adoption would be to lend 100-500GB of my spare hard disk space. I'd like to simply be able to start up an IPFS software piece and instruct it to be in "lending mode" -- I don't care what gets hosted on my machine if it helps the network, to put it bluntly.
EDIT: I can't see a way how to apply the user-story
label here.
I figured we should put a reading-list together to cover a lot of the concepts relevant to cluster. LINKS ONLY PLEASE, don't add files.
We'll want to touch on:
I thought up a possible architecture for a Reed-Solomon (or other erasure coding algorithm) layer on top of IPFS. My notes are here. @hsanjuan informed me that ipfs-cluster was already a tool that was planned, and that this architecture could slot into ipfs-cluster, so I should raise an issue here. One of the key points of the system is that there would be IPFS nodes that provide IPFS files that they do not have locally, but instead have to generate by accessing other files from the IPFS network and re-combining them.
Users should be able to start an ipfs-cluster node and have it join a pinning ring, that is, an existing set of nodes. These nodes would be archiving some interesting material for the participants. The newcomer should have an easy way to join the effort.
For this to work:
Current state and considerations:
The main key here is to understand what is the trust model in a pinning ring, how a pinning ring member gets trusted and loses the trust, and who can take those actions.
It would be helpful to point to the prebuilt binaries on dist.ipfs.io in the install section of the README
etc.
There are a number of pain points if the consensus state format changes upon an upgrade.
Currently, persistence is obtained via Raft snapshots which are loaded on boot and written on shutdown (at least). Raft snapshots format is coming from go-libp2p-raft
FSMSnapshot implementation, which is just a serialization of the state using msgpack.
If the state changes, loading the snapshot is likely to break. Also, this format is unreadable to the user and hard to work with.
A few thoughts about tackling this:
State
(a migration). This is potentially tricky on a large cluster:
go-libp2p-consensus
rollbacks are not very specific and this would work only because it's the way go-libp2p-raft
does it at the moment.At Climate Mirror, we deal with a large amount a data on a few, unfortunately centralized servers. However, a new initiative, Our Data Our Hands (ourdataourhands.org), hopes to shift the burden of storing data to everybody and their grandmothers' computers. We hope to accomplish this by providing docker containers that, once started, join a cluster of similar peers and contribute storage space. We will also sell pre-rolled hardware with large hard drives that can help stabilize the network.
In order to accomplish this, we need a few things:
Some of these are dreamy (threshold signatures to keep us from creating a massive botnet under one person's control) but others are fairly important, like 1, 3, 5, and 6.
Currently it just shows IDs of cluster members. It should:
This implies making some components PeerAware
and relaying any changes to the peer sets. Need to investigate how Raft behaves when altering the peers.
Some thoughts about it.
Currently we let the allocation strategy dictate where content should be pinned, even when it comes from an intercepted /add
request in the ipfs proxy. That means that it could be allocated somewhere else, and thus, content might need to be transferred one more time than if one of those allocations was the peer in which it was added.
If we forced one of the allocations to be the same peer where it was added:
Note that, this can be also part of a pinning strategy in an allocator, where candidate peers are allocated content if they are already pinning it.
They set the configuration key and then configuration is saved upon exiting, thus becoming permanent.
at some point, move ipfs-clusterd
(that should be the tool name) into its own repo (ipfs/go-ipfs-clusterd). this doesnt have to be now, but let's keep this repo for prototyping and general discussion. ipfs-cluster will mean a few diff repos.
Currently Pin/Unpin in the main component call RPC on the Leader() directly, but they should not be part of the specifics of the consensus protocol below.
LogPin and LogUnpin in the consensus component should call the RPC on the Leader instead.
Given a single existing cluster member, a new cluster node should be able to set itself up, retrieve and connect to all members of the cluster.
Note the trickiness of this:
ipfscluster-server
-config PATH
and not -config string
(not possible with vanilla flag I think)ipfscluster-server -init
a subcommand too ipfscluster-server init
ipfscluster
ipfscluster id
I read in the meeting notes that you're asking for use cases. Here is my very personal use case and wishlist for ipfs-cluster. I'm not 100% sure if this doesn't go beyond the scope of ipfs-cluster, but I'll just write it down here anyway.
I want to replace any distributed filesystem I've currently in use with ipfs. I'm using XtreemFS (because it works well over WAN) and have been using AndrewFS, GlusterFS and HDFS (without WANdisco) in the past.
My first use case is to store home directories in ipfs clusters and these are the features that I would really like to have:
So how would I use the above?
I would have a company-wide cluster where everyone can access their home directories from everywhere. The cluster would have sub-clusters which represent the different sites. Those would be connected over WAN. Inside each sub-cluster there would be nodes which are locally connected over LAN. I don't want to just have dedicated machines building up the cluster, but also each and every "client", which is why the client-side quota limit is important in my opinion.
I hope this is the same vision that you have for ipfs-cluster. I think it's a pretty common use case for a distributed filesystem.
Need to automatically bring up a real cluster, in a real cloud/hw, with real IPFS to and perform a number of standard cluster workloads.
Any measures extracted from these tests can be used as future reference regarding the performance of the Cluster.
We'll need these other NEW repos:
ipfs
libp2p
may need others.
I need to add a captain log.
$(pwd)/.ipfs-cluster
I dont like this. It's relative, prone to user error, departs from IPFS convention. If anything configurations live in /etc/
in any standard system. Local configs are usually in .config/<app-name>
these days.
I prefer the javascript style because that's the usual with JSON. Also related to how API responses. Look.
Handle panics/errors when configuration is really invalid (empty node ID etc).
Rename ipfs_port
to ipfs_node_port
and so on
Use multiaddress format like:
```
"ipfs_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
"ipfs_node_api": "/ip4/127.0.0.1/tcp/5001/http",
```
CLI apps should be tested by sharness, at least to check that they are not utterly broken.
- also ^c on one node, then ^c on the other hangs, looks like it's trapping the exit signal and waiting for the other members to respond, so it's stuck. cant kill it, only kill -9.
ctrl-c in one node causes some errors. Raft is for sure going to complain about this (as it should). Gotta investigate if there are further problems.
Participants:
@whyrusleeping
@hermanjunge
@jbenet
@christianlundkvist
daemon startup:
Api endpoints:
Tasks:
Here's some feedback from a user session. we tried it a bit-- got stuck then we moved to something else, but at least we got some info.
install.sh
script
install-local.sh
if install.sh
is misleadingctl := ipfs-cluster-ctl
service := ipfs-cluster-service
ctl id
ctl status
may want to be ctl pin status
ctl peers ls
looks good
ctl pin
pin
(commits the pin to whole cluster),pin complete
(the pin has actually completed to enough of the cluster to count as consensus)ctl peers ls
on the leader takes a whilectl peers ls
does not say what address using to connect to peer.
It's very inefficient now.
Let users download and install it easily.
The replication factor feature (#46) is ready (as described in the Captain's log). This adds the possibility of adding different pinning strategies.
Currently we only have a dummy numpin
Informer and a numpinalloc
PinAllocator for it.
It would be really useful to have other informers, which fetch different metrics, and other allocators which implement different strategies, for example a disk-space metric and allocator.
They just need to implement the Informer
interface and the Allocator
interface respectively. The existing examples show how this is done in a simple way.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.