GithubHelp home page GithubHelp logo

talent-plan / tinykv Goto Github PK

View Code? Open in Web Editor NEW
3.3K 26.0 973.0 1.43 MB

A course to build distributed key-value service based on TiKV model

License: Apache License 2.0

Makefile 0.44% Go 99.16% Shell 0.40%

tinykv's Introduction

The TinyKV Course

The TinyKV course builds a key-value storage system with the Raft consensus algorithm. It is inspired by MIT 6.824 and TiKV Project.

After completing this course, you will have the knowledge to implement a horizontally scalable, highly available, key-value storage service with distributed transaction support. Also, you will have a better understanding of TiKV architecture and implementation.

Course Architecture

The whole project is a skeleton code for a key-value server and a scheduler server at the beginning - you need to finish the core logic step by step:

  • Standalone KV
    • Implement a standalone storage engine.
    • Implement raw key-value service handlers.
  • Raft KV
    • Implement the basic Raft algorithm.
    • Build a fault-tolerant KV server on top of Raft.
    • Add the support of Raft log garbage collection and snapshot.
  • Multi-raft KV
    • Implement membership change and leadership change to Raft algorithm.
    • Implement conf change and region split on Raft store.
    • Implement a basic scheduler.
  • Transaction
    • Implement the multi-version concurrency control layer.
    • Implement handlers of KvGet, KvPrewrite, and KvCommit requests.
    • Implement handlers of KvScan, KvCheckTxnStatus, KvBatchRollback, and KvResolveLock requests.

Code Structure

overview

Similar to the architecture of TiDB + TiKV + PD that separates the storage and computation, TinyKV only focuses on the storage layer of a distributed database system. If you are also interested in the SQL layer, please see TinySQL. Besides that, there is a component called TinyScheduler acting as a center control of the whole TinyKV cluster, which collects information from the heartbeats of TinyKV. After that, the TinyScheduler can generate scheduling tasks and distribute the tasks to the TinyKV instances. All of instances are communicated via RPC.

The whole project is organized into the following directories:

  • kv contains the implementation of the key-value store.
  • raft contains the implementation of the Raft consensus algorithm.
  • scheduler contains the implementation of the TinyScheduler, which is responsible for managing TinyKV nodes and generating timestamps.
  • proto contains the implementation of all communication between nodes and processes uses Protocol Buffers over gRPC. This package contains the protocol definitions used by TinyKV, and the generated Go code that you can use.
  • log contains utility to output log based on level.

Reading List

We provide a reading list for the knowledge of distributed storage system. Though not all of them are highly related with this course, they can help you construct the knowledge system in this field.

Also, you're encouraged to read the overview of TiKV's and PD's design to get a general impression on what you will build:

Build TinyKV from Source

Prerequisites

  • git: The source code of TinyKV is hosted on GitHub as a git repository. To work with git repository, please install git.
  • go: TinyKV is a Go project. To build TinyKV from source, please install go with version greater or equal to 1.13.

Clone

Clone the source code to your development machine.

git clone https://github.com/tidb-incubator/tinykv.git

Build

Build TinyKV from the source code.

cd tinykv
make

It builds the binary of tinykv-server and tinyscheduler-server to bin dir.

Run TinyKV with TinySQL

  1. Get tinysql-server follow its document.
  2. Put the binary of tinyscheduler-server, tinykv-server and tinysql-server into a single dir.
  3. Under the binary dir, run the following commands:
mkdir -p data
./tinyscheduler-server
./tinykv-server -path=data
./tinysql-server --store=tikv --path="127.0.0.1:2379"

Now you can connect to the database with an official MySQL client:

mysql -u root -h 127.0.0.1 -P 4000

Autograding and certification

Since Jun 2022, we start using github classroom to accept labs and provide autograding timely. The github classroom invitation is https://classroom.github.com/a/cdlNNrFU. The discussion Wechat/Slack group and the certification after you pass the class is provided in the tinyKV learning class

Autograding is a workflow which can automatically run test cases and give feedback timely. However there are some limitations in Github classroom, in order to make golang work and run it in our self-hosted machines, you need to overwrite the workflow generated by Github classroom and commit it.

cp scripts/classroom.yml .github/workflows/classroom.yml
git add .github
git commit -m"update github classroom workflow"

Contributing

Any feedback and contribution is greatly appreciated. Please see issues if you want to join in the development.

tinykv's People

Contributors

connor1996 avatar dependabot[bot] avatar fengzixu avatar huaouo avatar jldataku avatar jychen7 avatar kikimo avatar learndatastorage avatar michaeldesteven avatar mind1949 avatar moonm3n avatar nasnoisaac avatar niebayes avatar ninglin-p avatar nothatdinger avatar nrc avatar onesizefitsquorum avatar oocococo avatar rapiz1 avatar smith-cruise avatar super-long avatar trafalgarricardolu avatar unconsolable avatar wenyxu avatar willendless avatar xiongjiwei avatar yangkeao avatar yanguwan avatar zhuo1angt avatar zyfzyf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tinykv's Issues

`TestRawNodeRestart2AC` & `TestRawNodeRestartFromSnapshot2C` expected result don't have softstate and hardstate

DeepEqual will fail because want and rd differ in softstate and hardstate.

maybe we need something like

want := Ready{
		&SoftState{
			Lead:      None,
			RaftState: StateFollower,
		},
		pb.HardState{
			Term:   st.Term,
			Commit: st.Commit,
		},
		[]pb.Entry{},
		pb.Snapshot{},
		// commit up to commit index in st
		entries[:st.Commit],
		make([]pb.Message, 0), // Messages []pb.Message should be empty slice or nil?
	}

region meta in Scheduler and TinyKV has same address

https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/kv/test_raftstore/scheduler.go#L345-L348
From the code, it seems that region meta in Scheduler and TinyKV should be independence.
https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/kv/raftstore/peer.go#L345-L352
but it send a pointer to scheduler. It will cause problem when change peer. Maybe it should be cloned first.

Access global context is racy

for example, storeMeta is accessed in both newRaftWorker (mutation for conf change) and newStoreWorker (reading for heartbeating to pd):

https://github.com/pingcap-incubator/tinykv/blob/fc36c301fbf92d05e30e629453a7c3282399af75/kv/raftstore/raftstore.go#L91

I was working on a solution and noticed inconsistent region info was sent to the scheduler (peer and region epoch field in the storeMeta was inconsistent) probably due to the fact that I was mutating it in place, updating fields one after another, and heartbeat worker reads seeing 2 fields one before and one after the mutation.

I avoided this issue in my solution by doing a deep copy:

https://github.com/junlong-gao/tinykv/blob/0fc6c28acffee911a38b2c043809e6977d7d2822/kv/raftstore/peer_msg_handler.go#L79

yet it is still racy.

Perhaps we should kindly add a mutex in the global context so folks working on this will notice the race condition here?

eraftpb.pb.go fileDescriptor inconsistence

I used make proto locally, then I used git status to check if any files have been modified,and the eraftpb.pb.go is the only one.As shown in the figure below,I found that the fileDescriptor of eraftpb.proto generated by protoc has been changed from 6ff387e2d76c3034 to 2f2e0bcef614736b.
image
Actually I think these generated code should not be pushed to git repo,but as you have pushed these files,is it better to make them consistent?Do I need to fix this?Or this is just because the difference of environment?

should heartbeat RPC update commit?

TestHandleHeartbeat2AA expect commit would change.

but later I encounter a bug like this

  • an old leader is separate away and receives a log(assume index 10) from a client(no chance to commit).
  • while the majority elect a new leader and commit is up to index 15.
  • when the old leader comes back to the cluster and receives a heartbeat RPC says commit is 15, it will apply the wrong log index 10(this log should be overwritten).

raft_test doesn't Step through messages generate by Raft.tick()?

from what I understant both local messages and network messages should be append into Raft.msgs, and some code is responsible for loop through Raft.msgs and call Raft.Step.
Is that correct?

so the local messages(MessageType_MsgHup and MessageType_MsgBeat) generate by Raft.tick() just sit in msgs, waiting to be process, until then it's state will not change.

if that is the case, the following code will not terminate
https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/raft/raft_test.go#L797-L799
https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/raft/raft_test.go#L853-L855

shall we expose ErrKeyNotFound error when we invoke GetCF Method ?

I encountered a pitfall in kv/server/server_test.go

Firstly

the test case TestRawGetAfterRawDelete1 in line 169 says we need to expose ErrKeyNotFound err when we try to get a non-exists key.

	resp, err := server.RawGet(nil, get)
	assert.Nil(t, err)
	assert.True(t, resp.NotFound)

Secondly

the code in TestRawDelete1 says we should return nil when we try to get a key that have already beed deleted.

	_, err := server.RawDelete(nil, req)
	assert.Nil(t, err)
	val, err := Get(s, cf, []byte{99})
	assert.Equal(t, nil, err)
	assert.Equal(t, []byte(nil), val)

In the end

I am able to pass the make project1 when I change this line .
from

	assert.Equal(t, nil, err)

to

	assert.Equal(t, badger.ErrKeyNotFound, err)

What I want to know

So, I wonder it this a bug ? or there is a missing part of my code that I should continue to implement ?

panic in NewTestServer

I encounter a bug in test testClusterSuite.TestConcurrentHandleRegion3C. when call NewTestServer, it will panic, error stack

PANIC: cluster_test.go:376: testClusterSuite.TestConcurrentHandleRegion3C

... Panic: invalid page type: 0: 4 (PC=0x43F645)

/usr/lib/go-1.14/src/runtime/panic.go:969
  in gopanic
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/cursor.go:250
  in Cursor.search
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/cursor.go:159
  in Cursor.seek
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/bucket.go:165
  in Bucket.CreateBucket
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/tx.go:108
  in Tx.CreateBucket
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/mvcc/backend/batch_tx.go:72
  in batchTx.UnsafeCreateBucket
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/mvcc/kvstore.go:149
  in NewStore
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/mvcc/watchable_store.go:78
  in newWatchableStore
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/mvcc/watchable_store.go:73
  in New
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/etcdserver/server.go:543
  in NewServer
/home/xjw/go/pkg/mod/go.etcd.io/[email protected]/embed/etcd.go:211
  in StartEtcd
server.go:163
  in Server.startEtcd
server.go:300
  in Server.Run
testutil.go:48
  in NewTestServer
cluster_test.go:379
  in testClusterSuite.TestConcurrentHandleRegion3C
/usr/lib/go-1.14/src/reflect/value.go:321
  in Value.Call
/home/xjw/go/pkg/mod/github.com/pingcap/[email protected]/check.go:850
  in suiteRunner.forkTest.func1
/home/xjw/go/pkg/mod/github.com/pingcap/[email protected]/check.go:739
  in suiteRunner.forkCall.func1
/usr/lib/go-1.14/src/runtime/asm_amd64.s:1373
  in goexit

do not konw whether it is my fault.

Possible typo in the README

the "overview of the code section" starts with

Same as the architect TiDB + TiKV + PD

should it not be this?

Same as the architecture of TiDB + TiKV + PD

raftstore: add integration tests

LAB2: kvraft

PartA: raft

  • raft related test

PartB: raftstore + apply

  • port MIT6.824 3A #63

PartC: snapshot + raftloggc

  • port MIT6.824 3B #72

LAB3: multiraft

PartA: support conf change in raft

  • raft related test

PartB: implement conf change in raftstore

  • test_simple_conf_change #75
  • test_split_brain

PartC: split

  • test_base_split_region #83
  • test_split_epoch_not_match

transaction layer

This issue is for the layer between the raft store and the server API, i.e., the server and storage modules in TiKV.

This layer must support the following:

  • translate protobuf data into internal data and commands
  • schedule and run commands
  • ensure that commands executing in parallel are serializable
  • handle locks and locking of data
  • manage locks (handle lock timeouts, etc)
  • delegate actual storage of key/values to raftstore
  • provide consistent reads (from snapshots)
  • handle rollbacks in the event of conflicts

At the highest level, this should be implemented in kv/tikv/server.go, by handling gRPC messages by calling the methods of InnerServer.

Add comments for runner handler

What does storage.Start() do

I think this function initializes storage, but it is not called in kv/server/server_test.go. So what does this function do?

transactions: multi-threaded transactions

Currently, the scheduler executes all requests sequentially on a single thread. We should execute them concurrently. This will require synchronization of the storage layer using latches.

Cleanup implementation

clean up the implementation that should be filled by students and move to a new branch

  • lab1
  • lab2
  • lab3
  • lab4

Failed to perform `make` command under macOS with go 1.13.6

After I performed the make command to experience this project. I faced a problem of golang compline. The error msg as below

GO111MODULE=on go build  -tags codes -ldflags '-L/usr/local/opt/icu4c/lib -X "main.gitHash=`git rev-parse HEAD`" ' -o bin/tinykv-server kv/tinykv-server/main.go
# command-line-arguments
flag provided but not defined: -L/usr/local/opt/icu4c/lib

Then I checked the Makefile under the root directory

LDFLAGS             += -X "main.gitHash=`git rev-parse HEAD`" 

It seems like just adding the version info. The L/usr/local/opt/icu4c/lib wasn't added by tinykv. So I checked the value of this env LDFLAGS on my host.

The answer is here

# echo $LDFLAGS                                                                                                                                                                                               
-L/usr/local/opt/icu4c/lib

LDFLAGS has a initial value which affected this project.

My solution is easy:

export LDFLAGS = ''

Actually, I am not sure if this initial value is necessary for tinykv. Maybe we can clear this initial value for LDFLAGS to avoid errors. It will be friendly for our students.

Possible typo in the README

Course material
Please follow the course material to learn the background knowledge and finish code step by step.

Should be Course Materials?

Consider implementing error for KeyError

just like how RegionError does, we can implement error for kvrpcpb.KeyError like this:

type KeyError struct {
	Err *kvrpcpb.KeyError
}

func (ke *KeyError) Error() string {
	return ke.Err.String()
}

so function can unify return value of kvrpcpb.KeyError into error. For example

func (p *Prewrite) prewriteMutation(txn *mvcc.MvccTxn, mut *kvrpcpb.Mutation) (*kvrpcpb.KeyError, error)  {}
->
func (p *Prewrite) prewriteMutation(txn *mvcc.MvccTxn, mut *kvrpcpb.Mutation) error {}
func (scan *Scanner) Next() ([]byte, []byte, interface{}) {}
-> 
func (scan *Scanner) Next() ([]byte, []byte, error) {}

Simplify Scheduler code base

Schedule Part

At present, all sent messages follow these flow:

request -> (get store/bind stream/get region) -> raft cluster HandleRegionHeartbeat -> Dispatch

drivePushOperator -> PushOperators -> Dispatch -> SendScheduleCommand

PromoteWaitingOperator -> addOperatorLocked -> SendScheduleCommand

All operators are created in these flow:

Scheduler runScheduler -> AddWaitingOperator 

patrolRegions -> AddWaitingOperator

We can simplify scheduler in these points:

  • Remove merge operator and all related changes (because merge operation has two operator)

  • Simplify the total routine. Maybe the simplest way: request -> operator. Delete waiting queue, pushDriver... Should we have a waiting queue (or simply cancel all exceeded operation?)

schedule:

  • checker: make replica checker only checks down peer and offline peer( or just remove replica_checker) #61
  • filter: remove unnecessary filters #62
  • operator: only keep TransferLeader, AddPeer, RemovePeer #61
  • opt: remove these options #62
  • placement: remove it #61
  • remove waiting queue, pushDriver #61

schedulers:

  • remove label scheduler #61
  • simplify balance region scheduler #62

Scope/requirements

I would like to define what is in and out of scope for the initial iteration of TinyKV. I'll start to make a list here, please comment if I got something wrong or if there are other things which should be in or out.

cc @siddontang, @Connor1996

In scope

  • Test framework
  • Distribution using Raft (Raft implemented as part of TinyKV, not using a library)
  • Raw and transactional API
  • Optimistic transactions (Percolator-style)
  • PD to balance regions and provide timestamps
  • Interoperable with TinySQL

Out of scope

  • Any kind of engine abstraction
  • Debug or diagnostics services (for development only)
  • Metrics/statistics
  • Any GC
  • Pessimistic transactions
  • backup/restore; sst import
  • profiler
  • status server
  • coprocessor
  • batched commands
  • security
  • interoperable with real TiDB and real PD

message from old term should ignore or not?

In file raft_test.go (line 535) TestHandleMessageType_MsgAppend2AB test case 8 and 9.
The message term is 1, and they are sent to a follower which in term 2, and expected the follower could append them and update state. From what I understand, I think this message should be ignore because this two message's term is slower than the follower. someone can explain this? thx. :)

create peer twice may fatal

  1. It may create peer twice when region split:
  • when region split, it should call createPeer
  • when applying AdminCmdType_Split, there is no way to find out whether the new peer is creating or not
  • store_worker may call replicatePeer to create new peer when recive msg from other node

so, it may create peer twice at same time

  1. error occur when create peer at same time
  • createPeer will init raftLocalState.LastIndex and raftLocalState.LastTerm to 5 and replicatePeer will init them to 0. and write them to storage.

Imaging this: if createPeer write raftLocalState.LastIndex and raftLocalState.LastTerm with 5 to storage and replicatePeer get them, but when init applyState, replicatePeer get nothing and init applyState to 0. it will encounter bug since raftLocalState and applyState is not match.

Am I right? or this situation will nerver happened and I miss something?

storage: refactor tests

The command tests are very repetitive with lots of boilerplate, there must be a better way

storage: rename or split up kvstore package

The idea with the package is to provide abstractions for interacting with a low-level kv store. However, transaction and lock/write do kind of different things and it is really a package for lowering TinySQL abstractions to KV abstractions.

Corrupted snap files due to hard linking received snapshots

When I worked on a solution, sometimes I get error related to applying snapshot, complaining some of the sst files are of invalid size 0.

This is strange as snapshots are validated on reception. If it passes reception, it should not have an invalid size.

After digging into the snapshot applying code path, I noticed that DB.IngestExternalFiles is just hardlinking the sst files into badger without removing the received file here:
https://github.com/pingcap-incubator/tinykv/blob/fc36c301fbf92d05e30e629453a7c3282399af75/kv/raftstore/snap/snap.go#L692

so what can happen is sst files are getting compacted as part of lsm compaction, and its size is reduced to 0 (so compaction literally does unmap/truncate to free space I assume?). when snapshot is re-sent, the follower will find the snapshot already exists. then when the follower applies the snapshot, it blows up on size mismatch.

we should either:

  1. make a copy when applying the snapshot, or better
  2. remove the link after ingesting the snapshot file (and the entire snapshot file info including the manifest)?

I did approach 1) in my solution:
https://github.com/junlong-gao/tinykv/blob/0fc6c28acffee911a38b2c043809e6977d7d2822/kv/raftstore/snap/snap.go#L692
and the issue no longer surfaces.

Could pingcap team please take a look at this and validate if my theory here makes sense?

`newTestPeerStorageFromEnts` not modify applyState in memory, only in DB

it only changes applyState in DB
https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/kv/raftstore/peer_storage_test.go#L34-L40

but provided function PeerStorage.truncatedIndex and PeerStorage.truncatedTerm is using in memory applyState
https://github.com/pingcap-incubator/tinykv/blob/1df74d600cb0c5001f32d2d21d6f89e9744ee3e0/kv/raftstore/peer_storage.go#L219-L221

so in TestPeerStorageTerm test 2 and 3, Term will return ErrCompacted, since truncatedIndex is initially 5

should we modify PeerStorage.truncatedIndex and PeerStorage.truncatedTerm function?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.