GithubHelp home page GithubHelp logo

otoolep / hraftd Goto Github PK

View Code? Open in Web Editor NEW
890.0 28.0 126.0 2.47 MB

A reference use of Hashicorp's Raft implementation

Home Page: http://www.philipotoole.com/building-a-distributed-key-value-store-using-raft/

License: MIT License

Go 100.00%
distributed-systems raft hashicorp-raft key-value go consensus

hraftd's Introduction

For background on this project check out this blog post.

hraftd

Circle CI Go Reference Go Report Card

hraftd is a reference example use of the Hashicorp Raft implementation. Raft is a distributed consensus protocol, meaning its purpose is to ensure that a set of nodes -- a cluster -- agree on the state of some arbitrary state machine, even when nodes are vulnerable to failure and network partitions. Distributed consensus is a fundamental concept when it comes to building fault-tolerant systems.

A simple example system like hraftd makes it easy to study the Raft consensus protocol in general, and Hashicorp's Raft implementation in particular. It can be run on Linux, macOS, and Windows.

Reading and writing keys

The reference implementation is a very simple in-memory key-value store. You can set a key by sending a request to the HTTP bind address (which defaults to localhost:11000):

curl -XPOST localhost:11000/key -d '{"foo": "bar"}'

You can read the value for a key like so:

curl -XGET localhost:11000/key/foo

Running hraftd

Building hraftd requires Go 1.20 or later. gvm is a great tool for installing and managing your versions of Go.

Starting and running a hraftd cluster is easy. Download and build hraftd like so:

mkdir work # or any directory you like
cd work
export GOPATH=$PWD
mkdir -p src/github.com/otoolep
cd src/github.com/otoolep/
git clone [email protected]:otoolep/hraftd.git
cd hraftd
go install

Run your first hraftd node like so:

$GOPATH/bin/hraftd -id node0 ~/node0

You can now set a key and read its value back:

curl -XPOST localhost:11000/key -d '{"user1": "batman"}'
curl -XGET localhost:11000/key/user1

Bring up a cluster

A walkthrough of setting up a more realistic cluster is here.

Let's bring up 2 more nodes, so we have a 3-node cluster. That way we can tolerate the failure of 1 node:

$GOPATH/bin/hraftd -id node1 -haddr localhost:11001 -raddr localhost:12001 -join :11000 ~/node1
$GOPATH/bin/hraftd -id node2 -haddr localhost:11002 -raddr localhost:12002 -join :11000 ~/node2

This example shows each hraftd node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.

This tells each new node to join the existing node. Once joined, each node now knows about the key:

curl -XGET localhost:11000/key/user1
curl -XGET localhost:11001/key/user1
curl -XGET localhost:11002/key/user1

Furthermore you can add a second key:

curl -XPOST localhost:11000/key -d '{"user2": "robin"}'

Confirm that the new key has been set like so:

curl -XGET localhost:11000/key/user2
curl -XGET localhost:11001/key/user2
curl -XGET localhost:11002/key/user2

Stale reads

Because any node will answer a GET request, and nodes may "fall behind" updates, stale reads are possible. Again, hraftd is a simple program, for the purpose of demonstrating a distributed key-value store. If you are particularly interested in learning more about issue, you should check out rqlite. rqlite allows the client to control read consistency, allowing the client to trade off read-responsiveness and correctness.

Read-consistency support could be ported to hraftd if necessary.

Tolerating failure

Kill the leader process and watch one of the other nodes be elected leader. The keys are still available for query on the other nodes, and you can set keys on the new leader. Furthermore, when the first node is restarted, it will rejoin the cluster and learn about any updates that occurred while it was down.

A 3-node cluster can tolerate the failure of a single node, but a 5-node cluster can tolerate the failure of two nodes. But 5-node clusters require that the leader contact a larger number of nodes before any change e.g. setting a key's value, can be considered committed.

Leader-forwarding

Automatically forwarding requests to set keys to the current leader is not implemented. The client must always send requests to change a key to the leader or an error will be returned.

Production use of Raft

For a production-grade example of using Hashicorp's Raft implementation, to replicate a SQLite database, check out rqlite.

hraftd's People

Contributors

boivie avatar otoolep avatar samuelramond avatar xujiajun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hraftd's Issues

a question about apply log in candidate node

i try to apply log in a candidate node, i know it must redirect to leader then execute,but i don't hope achieve this through http. In fact, i only hope i can achieve this through communication between its own clusters, but i don't how to do it,can you give me some ideas? thanks

Extended example of how to share cleaned up state when a new peer joins?

Please forgive my using GitHub issues for asking a question. I've been looking for an example of how Raft should deal with a peer joining after log compaction to get the "current state".

Example:

  1. Happily running a cluster of any size.
  2. Lots of logs (say, 10M) taken on, consensus always tight, eventually the log can be compacted.
  3. New peer joins.

Should the new peer catch-up on 10M events, or should it be given some snapshot of state to get caught-up, and if so, how is it typically done?


Actually, my use-case, and the reason I'm looking into raft is as a backend for an event source database, where the "my database is a stream of events" translates very very well to raft's distributed linear log concept. I'm not sure however if one should snapshot and store binary logs, and have new peers get a binary snapshot state rather than keeping all the messages in the history. I'm asking myself if my use-case even needs log compaction when the log is my database.

Additions

First off thank you so much for posting this. I has been very helpful.

I've added a couple of features if you are interested. Particularly:

  • Optimization of the command struct for logs.
  • Forwarding requests to the leader ( added an RPC server to the store) including join requests.
  • A KVStore interface to support other types of stores.

My ultimate goal is provide a generic framework to allow others to use the framework to embed a key value store into their applications.

You can find the repo here: https://github.com/euforia/hraftd/tree/forwarding. This is still a work in progress.

Let me know what you think.

Update to hashicorp/raft 1.0.0

hashicorp/raft was recently released as version 1.0.0, and that's now on master branch.

It's not backwards compatible with the old library, so adaptations are necessary.

Error messages when building:

~/r/go/src/github.com/otoolep/hraftd$ go build
# github.com/otoolep/hraftd/store
store/store.go:73:15: undefined: raft.NewJSONPeers
store/store.go:85:9: config.EnableSingleNode undefined (type *raft.Config has no field or method EnableSingleNode)
store/store.go:86:9: config.DisableBootstrapAfterElect undefined (type *raft.Config has no field or method DisableBootstrapAfterElect)
store/store.go:102:25: too many arguments in call to raft.NewRaft
store/store.go:161:21: cannot use addr (type string) as type raft.ServerAddress in argument to s.raft.AddPeer

hraftd wont start

$GOPATH/bin/hraftd -id node0 ~/node0 throws error
failed to open store: local bind address is not advertisable

Couple of questions about raft implementation

Hello! Thanks for your work on this implementation, I wanted to understand raft for a long time and your work helped to start. I have two questions:

  1. 3 node situation. When we terminate leader another node becomes a leader. And new leader tries to connect to terminated node periodically. What if we terminated that node intentionally (forever) and don't want new leader attempting to connect to it? Maybe I am missing some core idea here.

  2. Another question is about posting to /key endpoint. You wrote in README that leader-forwarding is not implemented. How external application can understand what's the current leader address to post new keys?

Project description typo

Hi,

One small typo is located on the project description and it jumps to my eyes everytime I see the project in github's trendings.

A reference use of Hashicorp's Raft implementation

@otoolep can correct it?
Thanks ;)

Build error

The current master doesn't seem to build:

$ go build .
# _/Users/prologic/tmp/hraftd
./main.go:60:18: too many arguments in call to s.Open
./main.go:60:35: undefined: raft.ServerID

When node dies, the next promoted master keeps trying to connect to the dead ones

Not sure if this is the expected functionality, but when i kill a master, the next node promoted to master keeps trying to reach the old one, generating bunch of logs like:

2017/10/24 13:48:42 [DEBUG] raft: Votes needed: 2
2017/10/24 13:48:42 [DEBUG] raft: Vote granted from node2 in term 68. Tally: 1
2017/10/24 13:48:44 [WARN] raft: Election timeout reached, restarting election
2017/10/24 13:48:44 [INFO] raft: Node at :12002 [Candidate] entering Candidate state in term 69
2017/10/24 13:48:44 [ERR] raft: Failed to make RequestVote RPC to {Voter node0 :12000}: dial tcp :12000: getsockopt: connection refused
2017/10/24 13:48:44 [ERR] raft: Failed to make RequestVote RPC to {Voter node1 :12001}: dial tcp :12001: getsockopt: connection refused

Failed to decode incoming command: unknown rpc type 80

Upon trying to set a key with the following command:
curl -XPOST localhost:12000/key -d '{"user1": "batman"}'
I get the following error:

2017/09/02 16:35:40 [DEBUG] raft-net: :12000 accepted connection from: [::1]:54508
2017/09/02 16:35:40 [ERR] raft-net: Failed to decode incoming command: unknown rpc type 80

The Join requests work though. Is this an issue with hashicorp/raft or because I updated to go 1.9?

how to make sure log has been commited ?

case "POST":
	// Read the value from the POST body.
	m := map[string]string{}
	if err := json.NewDecoder(r.Body).Decode(&m); err != nil {
		w.WriteHeader(http.StatusBadRequest)
		return
	}
	for k, v := range m {
		if err := s.store.Set(k, v); err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			return
		}
	}

api return success,but logs may be not commited ?

Raft re-adding peer gives Failed to AppendEntries error

Hi I am using the library to implement a raft cluster using your example

I am currently testing node failure and network partition test cases. However, I get the "failed to AppendEntries" issue when trying to reconnect a node.
Lets say I have a three node cluster C1,C2 and C3. Initially all the nodes work fine and C1 is elected as the leader. Also, when I try to shut down a single node (either leader or follower) and reconnect it, it connects successfully. However, when I try to turn down two follower nodes i.e. C2 and C3 and try to reconnect them then I start getting the AppendEntries error and I have to restart all the nodes and delete the storage to make it work.
Similarly, when I try to create a network partition using iptable rules and disconnect C2 from the other two nodes they seem to work fine. However, when C2 is reconnected I again get the same error.

I also tried to create a 5 node cluster with C1,C2,C3,C4 and C5 but I am facing the same issues on that as well.

I have already tried the solution mentioned at the following link. However, it doesn't work for me. Is this a problem with the hashicorp library or this implementation? IS there anything that can be done to fix this issue?
#78

The exact error is pasted below:

[DEBUG] raft: Failed to contact cluster3 in 9.867687472s
2021-06-15T15:20:00.772+0300 [ERROR] raft: Failed to make RequestVote RPC to {Voter cluster3 127.0.1.1:31003}: read tcp 127.0.0.1:39376->127.0.1.1:31003: i/o timeout
2021-06-15T15:20:00.781+0300 [ERROR] raft: Failed to AppendEntries to {Voter cluster3 127.0.1.1:31003}: read tcp 127.0.0.1:39382->127.0.1.1:31003: i/o timeout
2021-06-15T15:20:00.899+0300 [ERROR] raft: Failed to heartbeat to 127.0.1.1:31003: read tcp 127.0.0.1:39388->127.0.1.1:31003: i/o timeout
2021-06-15T15:20:01.121+0300 [DEBUG] raft: Failed to contact cluster3 in 10.341185724s

stale reads.

The readme says Because any node will answer a GET request, and nodes may "fall behind" updates, stale reads are possible.
If I wanted to fix that, how would I go about it?
First I thought that if I took[1] and replaced it with

func (s *Store) Get(key string) (string, error) {
      if s.raft.State() != raft.Leader {
		return fmt.Errorf("not leader")
	}
	s.mu.Lock()
	defer s.mu.Unlock()
	return s.m[key], nil
}

then I would have cured the ail.
But then, I came across[2], and from that thread The leader should check before serving any read if it is actually still the leader. It can happen, that a leader has not detected a network partition since heatbeat failures are not detected instantaneously.
so it seems like my code above wouldn't be enough to solve it.
So how would I completely solve it??

ref:

  1. hraftd/store/store.go

    Lines 109 to 113 in 915a171

    func (s *Store) Get(key string) (string, error) {
    s.mu.Lock()
    defer s.mu.Unlock()
    return s.m[key], nil
    }
  2. rqlite/rqlite#5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.