GithubHelp home page GithubHelp logo

ebay / gringofts Goto Github PK

View Code? Open in Web Editor NEW
102.0 14.0 29.0 8.72 MB

Gringofts makes it easy to build a replicated, fault-tolerant, high throughput and distributed event-sourced system.

License: Apache License 2.0

CMake 3.22% Shell 1.92% Dockerfile 0.06% C++ 94.80%
raft-consensus cpp event-sourcing distributed fault-tolerance

gringofts's Introduction

CI codecov

Table of Contents

Introduction

Gringofts makes it easy to build a replicated, fault-tolerant, high throughput and distributed event-sourced system. This type of system can process and store critical transaction data which is valuable for an entity.

Industries and typical applications that can benefit from Gringofts (but not limited to):

  • Finance - Payment processing system
  • Government - Civic Engagement system
  • Healthcare - Electronic medical records (EMR) software
  • Retail - Orders and sales management system
  • Transport - Logistics and order management system
  • ...

Features

Dependable Data

  1. Data is highly available
    Gringofts enables data to be stored in multiple replicas across data centers. Different deployment models are supported to achieve various levels of availability. For example, in a typical 2-2-1 setup (two replicas in each of the first two data centers and one replica in the third) data is still available even if two data centers are down.

  2. Data is bit-level accurate across all replicas
    Whichever replica you access, you get the exact same data. This is a top requirement for some applications such as a payment processing system, where each transaction must be accurately recorded.

  3. Data is secure and tamper-proof
    Industry-tested and accepted standards are used to encrypt the data and a blockchain-like technology is applied to avoid data being changed.

High Throughput

Internal experiments show that a single cluster with a 2-1 setup (three replicas across two data centers) can process 8,000 transactions per second. Since the framework is designed to be linear-scalable, throughput can be increased with the increase of clusters. Below is a test result of a 667-cluster setup.

Benchmark

Full Audit-ability

This is another must-have feature in most critical enterprise systems. Every write operation to a Gringofts-powered application is persisted by default and is immutable. A comprehensive toolset is also available to access this type of information.

Easy to Use

Gringofts users are usually domain experts. They only need to focus on two things which they are very good at and Gringofts will take care of the rest:

  1. Domain model
    This defines objects required to solve the target business problem. For example, in an Order Management system, Order is a domain object.
  2. Domain object interaction
    This defines how domain objects interact with each other. The interaction is usually modeled as two special objects called Command and Event, for example PlaceOrderCommand and OrderPlacedEvent.

100% State Reproducibility

Application state at any point of time in history is reproducible. This feature is especially useful when an application recovers after a crash or if users want to debug an issue.

Get Started

Supported Platforms

Currently the only recommended platform is Ubuntu 16.04. We plan to support more platforms in the near future.

Set up Source Dependencies

bash ./scripts/addSubmodules.sh

Build

Build via Docker (Recommended)

This approach requires minimum dependencies on the target OS as all of the dependencies are encapsulated in a docker image.

  1. Build docker image for compiling the project (one-time setup)
    sudo docker build --rm -t gringofts/dependencies:v1 -f dockers/dependencies/download.Dockerfile .
    sudo docker build --rm -t gringofts/compile:v1 -f dockers/dependencies/install.Dockerfile .
  2. Build binaries
    sudo docker run --workdir "$(pwd)" --mount type=bind,source="$(pwd)",target="$(pwd)" --user "$(id -u)":"$(id -g)" gringofts/compile:v1 hooks/pre-commit

Build directly on local OS

  1. Install external dependencies (one-time setup)
    sudo bash ./scripts/setupDevEnvironment.sh
    if ! grep 'export PATH=/usr/local/go/bin:$PATH' ~/.profile; then echo 'export PATH=/usr/local/go/bin:$PATH' >> ~/.profile; fi && \
    source ~/.profile
  2. Build binaries
    hooks/pre-commit-build

Run Demo App

  1. Backed by a single-cluster setup
    examples/run_demo_backed_by_single_cluster.sh
    You can use grpc_cli to verify:
    ./grpc_cli call 0.0.0.0:50055 ringofts.demo.protos.DemoService.Execute "value:1"
    Sample output:
    connecting to 0.0.0.0:50055
    code: 200
    message: "Success"
    
    Rpc succeeded with OK status
  2. Backed by a three-nodes-cluster setup
    examples/run_demo_backed_by_three_nodes_cluster.sh
  3. Backed by SQLite3
    examples/run_demo_backed_by_sqlite.sh

Development Environment Setup

Please refer to this doc for details.

Core Developers

Please see here for all contributors.

Acknowledgements

Special thanks to people who give your support on this project.

License Information

Copyright 2019-2020 eBay Inc.

Authors/Developers: Bin (Glen) Geng, Qi (Jacky) Jia

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Use of 3rd Party Code

Some parts of this software include 3rd party code licensed under open source licenses (in alphabetic order):

  1. OpenSSL
    URL: https://www.openssl.org/
    License: https://www.openssl.org/source/license.html
    Originally licensed under the Apache 2.0 license.

  2. RocksDB
    URL: https://github.com/facebook/rocksdb
    License: https://github.com/facebook/rocksdb/blob/master/LICENSE.Apache
    Apache 2.0 license selected.

  3. SQLite
    https://www.sqlite.org/index.html
    License: https://www.sqlite.org/copyright.html
    SQLite Is Public Domain

  4. abseil-cpp
    URL: https://github.com/abseil/abseil-cpp
    License: https://github.com/abseil/abseil-cpp/blob/master/LICENSE
    Originally licensed under the Apache 2.0 license.

  5. cpplint
    URL: https://github.com/google/styleguide
    License: https://github.com/google/styleguide/blob/gh-pages/LICENSE
    Originally licensed under the Apache 2.0 license.

  6. inih
    URL: https://github.com/benhoyt/inih
    License: https://github.com/benhoyt/inih/blob/master/LICENSE.txt Originally licensed under the New BSD license.

  7. gRPC
    URL: https://github.com/grpc/grpc
    License: https://github.com/grpc/grpc/blob/master/LICENSE
    Originally licensed under the Apache 2.0 license.

  8. googletest
    URL: https://github.com/google/googletest
    License: https://github.com/google/googletest/blob/master/LICENSE
    Originally licensed under the BSD 3-Clause "New" or "Revised" license.

  9. prometheus-cpp
    URL: https://github.com/jupp0r/prometheus-cpp
    License: https://github.com/jupp0r/prometheus-cpp/blob/master/LICENSE Originally licensed under the MIT license.

  10. spdlog
    URL: https://github.com/gabime/spdlog
    License: https://github.com/gabime/spdlog/blob/master/LICENSE
    Originally licensed under the MIT license.

gringofts's People

Contributors

crystal-xu avatar dongbincheng avatar huyumars avatar jackyjia avatar jingyichen1223 avatar justinabrahms avatar qiawu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gringofts's Issues

Make raft constants configurable

Currently, in Gringofts, some important raft parameters are constants. It isn't flexible to change them for each environment.

Why in-memory state update is earlier than raft committed

If raft committed has not been successful and server crash, resulting in changing leader, it will cause the previously updated memory state to be discarded.
If an external service comes to query state before this, the abandoned state will be queried. Is this scenario tolerable?
Thank you

Fix grpc backoff time to improve availability in distributed mode

Background

Since GRPC client use reconnect back-off algorithm when the node is unaccessible, the down node will encounter election timeout and launch a new round election after recovery with high probability. If same leader wins in the newly launched election and the RPC channel between this node and the same leader is still in back-off period, the node will encounter election timeout and launch a second round election again.

Leader will repeatedly step down after receiving RV with higher term.

Backoff period: best case: 1s, worst Case: nearly 120s. During backoff period, recovered node will repeatedly sending RV.

Solution

Fix grpc backoff time by setting below option: (current setting is 100ms)
"grpc.testing.fixed_reconnect_backoff_ms"

Note: This solution is suitable for scenarios where the server does not serve a large number of clients.

Reference

[1] Connection Backoff Protocol

metrics in RaftMonitorAdaptor are not exposed

Background

Metrics in below class are not exported properly.
image

How to Reproduce

  1. Get the latest code from master
  2. run examples/run_demo_backed_by_single_cluster.sh
  3. curl localhost:9091/metrics | grep "current_role_v2" returns empty

Acceptance Criteria

Missing metrics are exported

how to solve make problem

In file included from /home/template/download/Gringofts/test/infra/es/ReadonlyCommandEventStoreTest.cpp:18:
/home/template/download/Gringofts/test/infra/es/ReadonlyCommandEventStoreMock.h: In member function ‘void testing::SetMovePointeeAction::Perform(const ArgumentTuple&) const’:
/home/template/download/Gringofts/test/infra/es/ReadonlyCommandEventStoreMock.h:71:15: error: ‘CompileAssertTypesEqual’ is not a member of ‘testing::internal’
71 | internal::CompileAssertTypesEqual<void, Result>();
| ^~~~~~~~~~~~~~~~~~~~~~~
/home/template/download/Gringofts/test/infra/es/ReadonlyCommandEventStoreMock.h:71:39: error: expected primary-expression before ‘void’
71 | internal::CompileAssertTypesEqual<void, Result>();
| ^~~~
test/CMakeFiles/gringofts_TestRunner.dir/build.make:159: recipe for target 'test/CMakeFiles/gringofts_TestRunner.dir/infra/es/ReadonlyCommandEventStoreTest.cpp.o' failed
make[2]: *** [test/CMakeFiles/gringofts_TestRunner.dir/infra/es/ReadonlyCommandEventStoreTest.cpp.o] Error 1
CMakeFiles/Makefile2:4437: recipe for target 'test/CMakeFiles/gringofts_TestRunner.dir/all' failed
make[1]: *** [test/CMakeFiles/gringofts_TestRunner.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 77%] Linking CXX executable ../../../build/StorageMain
[ 77%] Linking CXX executable ../../build/DemoApp
../../build/libgringofts_infra_raft.a(RaftService.cpp.o): In function grpc::CompletionQueue::~CompletionQueue()': /usr/local/include/grpcpp/impl/codegen/completion_queue.h:119: undefined reference to absl::Mutex::~Mutex()'
../../build/libgringofts_infra_raft.a(RaftService.cpp.o): In function grpc::CompletionQueue::CompletionQueue(grpc_completion_queue_attributes const&)': /usr/local/include/grpcpp/impl/codegen/completion_queue.h:253: undefined reference to absl::Mutex::~Mutex()'
../../build/libgringofts_infra_util.a(FileUtil.cpp.o): In function gringofts::FileUtil::setFileContentWithSync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': /home/template/download/Gringofts/src/infra/util/FileUtil.cpp:101: undefined reference to absl::StrCat[abi:cxx11](absl::AlphaNum const&, absl::AlphaNum const&)'
collect2: error: ld returned 1 exit status
src/infra/CMakeFiles/RaftMain.dir/build.make:178: recipe for target 'build/RaftMain' failed
make[2]: *** [build/RaftMain] Error 1
CMakeFiles/Makefile2:4104: recipe for target 'src/infra/CMakeFiles/RaftMain.dir/all' failed
make[1]: *** [src/infra/CMakeFiles/RaftMain.dir/all] Error 2
../../../build/libgringofts_infra_util.a(FileUtil.cpp.o): In function gringofts::FileUtil::setFileContentWithSync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': /home/template/download/Gringofts/src/infra/util/FileUtil.cpp:101: undefined reference to absl::StrCat[abi:cxx11](absl::AlphaNum const&, absl::AlphaNum const&)'
collect2: error: ld returned 1 exit status
src/infra/raft/CMakeFiles/StorageMain.dir/build.make:196: recipe for target 'build/StorageMain' failed
make[2]: *** [build/StorageMain] Error 1
CMakeFiles/Makefile2:4194: recipe for target 'src/infra/raft/CMakeFiles/StorageMain.dir/all' failed

make occur error

my environment: macos clang compile
steps:
1 bash ./scripts/addSubmodules.sh
2 sudo bash ./scripts/setupDevEnvironment.sh
if ! grep 'export PATH=/usr/local/go/bin:$PATH' ~/.profile; then echo 'export PATH=/usr/local/go/bin:$PATH' >> ~/.profile; fi &&
source ~/.profile
3 cmake -DCMAKE_BUILD_TYPE=Debug
4 make gringofts_check
5 make

but occur the following error:
image

how can I fix it, thank you.

Event store HA

Hi,

For Gringofts production deployment, what is used for event store? Database or Kafka? I assume event store should be high available so it survives server failure.

Thanks

@crystal-xu

Handle the case the a follower's log was rolled back.

Currently, when a follower's log is rolled back, the leader will crash again and again, which leads to no stable leader.
We plan to fix the issue. The expected behavior is:

  • The cluster runs as normal. There is a stable leader.
  • Leader will send log entries to the follower once its data is recovered.

Multi-lang support

Background

To enable applications built on top of Gringofts to use other coding languages such as Java and Go as they are much easier to develop with. Gringofts needs to expose its ability to the app layer written in languages other than C++.

Acceptance Criteria

  • A demo in Java or Go that can call into Gringofts

Support PreVote

Prevote prevents disruptions when a server rejoins the cluster.

Without Prevote

When a server is partitioned, it will not receive heartbeats. It will soon increment its term to start an election, although it won’t be able to collect enough votes to become leader. When the server regains connectivity sometime later, its larger term number will propagate to the rest of the cluster (either through the server’s RequestVote requests or through its AppendEntries response). This will force the cluster leader to step down, and a new election will have to take place to select a new leader. Fortunately, such events are likely to be rare, and each will only cause one leader to step down.

What is Prevote

In the Pre-Vote algorithm, a candidate only increments its term if it first learns from a majority of the cluster that they would be willing to grant the candidate their votes (if the candidate’s log is sufficiently up-to-date, and the voters have not received heartbeats from a valid leader for at least a baseline election timeout).

With Prevote

While a server is partitioned, it won’t be able to increment its term, since it can’t receive permission from a majority of the cluster. Then, when it rejoins the cluster, it still won’t be able to increment its term, since the other servers will have been receiving regular heartbeats from the leader. Once the server receives a heartbeat from the leader itself, it will return to the follower state (in the same term).

how to improve high-availability when waitTillLeaderIsReadyOrStepDown

If the leader needs to be really ready, it needs to apply all business data successfully after the election.
During this period of time, the leader cannot actually provide services. In Gringofts, the method waitTillLeaderIsReadyOrStepDown will be waiting.
If a lot of data needs to be applied, it will wait for a long time.
Does Gringofts have some high-availability solutions that can minimize the time when services are unavailable?
Thank you.

how to solve the make problem

Scanning dependencies of target lib_inih
[ 0%] Building C object CMakeFiles/lib_inih.dir/third_party/inih/ini.c.o
[ 2%] Building CXX object CMakeFiles/lib_inih.dir/third_party/inih/cpp/INIReader.cpp.o
[ 2%] Linking CXX static library build/liblib_inih.a
[ 2%] Built target lib_inih
[ 2%] Generating proto API for "streaming.proto".
[ 5%] Generating proto API for "raft.proto".
Scanning dependencies of target gringofts_raft_proto_library
[ 5%] Building CXX object src/infra/raft/CMakeFiles/gringofts_raft_proto_library.dir/generated/raft.pb.cc.o
[ 5%] Building CXX object src/infra/raft/CMakeFiles/gringofts_raft_proto_library.dir/generated/raft.grpc.pb.cc.o
[ 7%] Building CXX object src/infra/raft/CMakeFiles/gringofts_raft_proto_library.dir/generated/streaming.pb.cc.o
[ 7%] Building CXX object src/infra/raft/CMakeFiles/gringofts_raft_proto_library.dir/generated/streaming.grpc.pb.cc.o
[ 7%] Linking CXX static library ../../../build/libgringofts_raft_proto_library.a
[ 7%] Built target gringofts_raft_proto_library
Scanning dependencies of target gringofts_infra_raft
[ 7%] Building CXX object src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/metrics/RaftMonitorAdaptor.cpp.o
[ 7%] Building CXX object src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/RaftLogStore.cpp.o
In file included from /home/template/download/Gringofts/src/infra/raft/../es/CommandMetaData.h:21,
from /home/template/download/Gringofts/src/infra/raft/../es/Command.h:23,
from /home/template/download/Gringofts/src/infra/raft/RaftLogStore.h:18,
from /home/template/download/Gringofts/src/infra/raft/RaftLogStore.cpp:15:
/home/template/download/Gringofts/src/infra/raft/../es/MetaData.h:20:10: fatal error: store/generated/store.pb.h: No such file or directory
20 | #include "store/generated/store.pb.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/build.make:94: recipe for target 'src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/RaftLogStore.cpp.o' failed
make[2]: *** [src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/RaftLogStore.cpp.o] Error 1
CMakeFiles/Makefile2:4158: recipe for target 'src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/all' failed
make[1]: *** [src/infra/raft/CMakeFiles/gringofts_infra_raft.dir/all] Error 2
Makefile:148: recipe for target 'all' failed

provide an interface to get raft members' offsets

Providing such an interface could enable some advanced features. For example, reading from in-sync followers could be achieved by calculating the offset lags between followers and leader from the members' offsets returned by this interface. In that case slow followers with large lags could be evicted from the read list.

PR: #63

Support non-voting member role for Raft cluster

In order to avoid availability gaps, Raft introduces an additional phase before the configuration change, in which a new server joins the cluster as a non-voting member. The leader replicates log entries to it, but it is not yet counted towards majorities for voting or commitment purposes.
The mechanism of non-voting servers can also be useful in other contexts. For example, it can be used to replicate the state to a large number of servers, which can serve read-only requests with relaxed consistency.

Why not use the ring buffer like Disruptor

Hi, why we don't use the ring buffer queues like Disruptor? I found out that we are all using double buffered queues.
Is it because the double buffered queue totally avoids synchronization/mutex? Will it have better performance than ring buffer queue?
Thank you!

Setup CI

Background

We need a way to run CI daily and every time a PR is merged to ensure various quality scenarios are met.

Acceptance Criteria

  • Travis CI is set up for daily run and private-ci

Fix flaky test RaftMonitorAdaptorTest.raftAdaptorTest

Trace

[ RUN      ] RaftMonitorAdaptorTest.raftAdaptorTest
[2020-03-26 23:12:34.479] [info] [Util.h:88] Execute command 'mkdir ../test/infra/raft/node_1', Output ''
[2020-03-26 23:12:34.480] [info] [RaftCore.cpp:58] ConfigurableVars: max.batch.size=2000, max.len.in.bytes=4000000, max.decr.step=2000, max.tailed.entry.num=5.
[2020-03-26 23:12:34.480] [info] [RaftCore.cpp:67] read raft cluster conf from local, [email protected]:5253,[email protected]:5254,[email protected]:5255, self.id=1
[2020-03-26 23:12:34.482] [info] [RaftCore.cpp:113] cluster.size=3, self.id=1, self.address=0.0.0.0:5253
[2020-03-26 23:12:34.482] [info] [RaftCore.cpp:134] Use SegmentLog, storage.dir=../test/infra/raft/node_1, segment.data.size.limit=67108864, segment.meta.size.limit=4194304
[2020-03-26 23:12:34.482] [warning] [CryptoUtil.cpp:48] Raft log is plain.
[2020-03-26 23:12:34.487] [info] [SegmentLog.cpp:117] ignore file: ../test/infra/raft/node_1/first_index
[2020-03-26 23:12:34.487] [info] [SegmentLog.cpp:117] ignore file: ../test/infra/raft/node_1/vote_for
[2020-03-26 23:12:34.487] [info] [SegmentLog.cpp:117] ignore file: ../test/infra/raft/node_1/current_term
[2020-03-26 23:12:34.487] [info] [Segment.cpp:105] Create an activeSegment, timeCost=0.0803020ms
[2020-03-26 23:12:34.487] [info] [TlsUtil.h:96] Client Side TLS disabled.
[2020-03-26 23:12:34.488] [info] [TlsUtil.h:96] Client Side TLS disabled.
[2020-03-26 23:12:34.488] [info] [RaftCore.cpp:178] skip setup of raft main loop for ut.
[2020-03-26 23:12:34.488] [info] [TlsUtil.h:78] Server Side TLS disabled.
Segmentation fault (core dumped)

Acceptance Criteria

  • Root cause identified
  • Above test case re-enabled

Support customized way to get AES key

Currently, Gringofts can get AES key from a protobuf formatted file only. To allow users to get AES keys from other sources, for example, you can get keys from a security service, Gringofts will extend its interface: you can pass in a factory to RaftCore. This factory is responsible for generating customized AES key fetcher. If no factory is passed in, Gringofts will keep the behavior as before.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.